WARNINGRUNTIME UPDATE·2026-06-16
llama.cpp ships b9664
▼ WHAT HAPPENED
llama.cpp cut release b9664 on 2026-06-16. Release notes excerpt: "sycl: support reordered Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (#24452) * sycl: support reordered Q4_K and Q5_K MoE MUL_MAT_ID Extend reordered-weight handling to fused MoE MUL_MAT_ID for Q4_K and Q5_K expert tensors and add Q5_K reordered DMMV coverage. Unsupported 3D reorder cases now fall back instead of aborting. * sycl: extend MoE reorder to Q6_K mul_mat_id **ma..."
▼ OPERATOR ANGLE
Read the release notes and decide whether operators need to act. test throughput and memory fit before pinning the new version. Publish if this changes model compatibility, GPU backend behavior, memory use, quantization paths, security posture, migration requirements, or production serving reliability.
[pulse item] · runlocalai.co/pulse/gh-ggml-org-llama-cpp-b9664