llama.cpp ships b9718

▼ WHAT HAPPENED

llama.cpp cut release b9718 on 2026-06-19. Release notes excerpt: "server : consolidate slot selection into get_available_slot (#24755) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested, the LCP similarity check still runs to enable proper prompt cache updates. Assisted-by: pi:llama.cpp/Qwen3.6-27B **macOS/iOS:** - [macOS Apple Si..."

▼ OPERATOR ANGLE

Read the release notes and decide whether operators need to act. test throughput and memory fit before pinning the new version. Publish if this changes model compatibility, GPU backend behavior, memory use, quantization paths, security posture, migration requirements, or production serving reliability.

SOURCE: https://github.com/ggml-org/llama.cpp/releases/tag/b9718[GITHUB-RELEASE]

[pulse item] · runlocalai.co/pulse/gh-ggml-org-llama-cpp-b9718