← /pulse/gh-ggml-org-llama-cpp-b9626
WARNINGRUNTIME UPDATE·2026-06-13

llama.cpp ships b9626

▼ WHAT HAPPENED

llama.cpp cut release b9626 on 2026-06-13. Release notes excerpt: "Add arch support for cohere2-MoE (#24260) * Add arch support for cohere2-MoE * Removed redundant gating_func checks * Changed ffn lookup to prefer prefix_dense_intermediate_size * Renamed arch to cohere2moe * Removed redundant lmhead check and chat template changes * Removed lm_head.weight check from modify tensors, load output tensor not required, fallback..."

▼ OPERATOR ANGLE

Read the release notes and decide whether operators need to act. read migration notes before upgrading production runners. Publish if this changes model compatibility, GPU backend behavior, memory use, quantization paths, security posture, migration requirements, or production serving reliability.
[pulse item] · runlocalai.co/pulse/gh-ggml-org-llama-cpp-b9626