mirror of
https://github.com/ollama/ollama.git
synced 2026-06-03 22:13:30 -04:00
gpt-oss: rename arch "gptoss" -> "gpt-oss" (incl. KV prefix), inject the missing `expert_feed_forward_length` from the ffn_gate_exps shape, and rename `attn_out`/`attn_sinks`/`ffn_norm` tensors to upstream's `attn_output`/`attn_sinks.weight`/`post_attention_norm`. Also remove the library/gpt-oss -> dhiltgen/gpt-oss redirect now that the compat shim handles it directly. lfm2: rename `output_norm.weight` -> `token_embd_norm.weight` and fix a stale `lfm2.feed_forward_length` (some Ollama blobs claim 12288 on a model whose ffn_gate is [2048, 8192]) by reading the real value off the ffn_gate tensor shape. Adds two helpers to compat-util: `copy_kv` (type-preserving generic KV copy) and `rename_kv_prefix` (bulk-copy every KV with a given prefix to a new prefix). Old keys are left in place — harmless because the loader queries by exact name and only the new prefix matters. Tested locally: gpt-oss:20b and lfm2.5-thinking now load + generate coherently against an unmodified upstream llama-server build.