Pulse — what changed for operators · RunLocalAI

RUNTIME1w ago

llama.cpp ships b9718

llama.cpp cut release b9718 on 2026-06-19. Release notes excerpt: "server : consolidate slot selection into get_available_slot (#24755) Absorb get_slot_by_id logic into get_available_slot so slot selection is handled by a single function call. When a specific slot id is requested...

RUNTIME1w ago

llama.cpp ships b9721

llama.cpp cut release b9721 on 2026-06-19. Release notes excerpt: "sync : ggml **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9721/llama-b9721-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLE...

RUNTIME1w ago

llama.cpp ships b9722

llama.cpp cut release b9722 on 2026-06-19. Release notes excerpt: "server: fix non-bound n_discard value (ctx shifting) (#24786) * server: fix non-bound n_discard value * Update tools/server/server-context.cpp Co-authored-by: Georgi Gerganov --------- Co-authored-by: Georgi Gerga...

RUNTIME1w ago

llama.cpp ships b9714

llama.cpp cut release b9714 on 2026-06-19. Release notes excerpt: "server: add "X-Accel-Buffering": "no" header to streaming endpoints (#24774) * server: add "X-Accel-Buffering": "no" header to streaming endpoints This header tells Nginx (as a reverse proxy) to NOT buffer respons...

RUNTIME1w ago

llama.cpp ships b9715

llama.cpp cut release b9715 on 2026-06-19. Release notes excerpt: "Ggml/cuda col2im 1d (#24417) * cuda: add GGML_OP_COL2IM_1D, follow-up to the CPU op * cuda: col2im_1d use fast_div_modulo for the index decomposition * cuda: col2im_1d tighten supports_op, type match and contiguou...

RUNTIME1w ago

llama.cpp ships b9716

llama.cpp cut release b9716 on 2026-06-19. Release notes excerpt: "mtmd: add batching support for internvl (#24775) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9716/llama-b9716-bin-macos-arm64.tar.gz) - macOS Apple Silic...

RUNTIME1w ago

ComfyUI ships v0.25.1

ComfyUI cut release v0.25.1 on 2026-06-18. Release notes excerpt: "* [Partner Nodes] feat(Kling): add support for Kling V3-Turbo model (https://github.com/Comfy-Org/ComfyUI/pull/14528). **Full Changelog**: https://github.com/Comfy-Org/ComfyUI/compare/v0.25.0...v0.25.1"

RUNTIME1w ago

llama.cpp ships b9704

llama.cpp cut release b9704 on 2026-06-18. Release notes excerpt: "server : return HTTP 400 on invalid grammar (#24144) (#24154) Throw on grammar parse failure so the server returns HTTP 400 instead of silently dropping the constraint. Add a regression test for the invalid-gramma...

RUNTIME1w ago

llama.cpp ships b9702

llama.cpp cut release b9702 on 2026-06-18. Release notes excerpt: "server: fix router args not being forwarded to child instances (#24760) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9702/llama-b9702-bin-macos-arm64.tar....

RUNTIME1w ago

llama.cpp ships b9701

llama.cpp cut release b9701 on 2026-06-18. Release notes excerpt: "mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736) * add mtmd_image_preproc_out * add dev docs * remove unused clip API * rm unused clip_image_f32_batch::grid * change preprocess() call signature **m...

RUNTIME1w ago

llama.cpp ships b9703

llama.cpp cut release b9703 on 2026-06-18. Release notes excerpt: "server: (router) rework -hf preset repo (#24739) * server: temporary remove HF remote preset * rework remove preset.ini support * rm unused get_remote_preset_whitelist() * print warning * add docs * rm stray file...

RUNTIME1w ago

ollama ships v0.30.10

ollama cut release v0.30.10 on 2026-06-17. Release notes excerpt: "## What's Changed * models: add Cohere2MoE model by @jmorganca in https://github.com/ollama/ollama/pull/16670 * llama: update llama.cpp to b9672 by @pdevine in https://github.com/ollama/ollama/pull/16775 **Full Ch...

RUNTIME1w ago

llama.cpp ships b9691

llama.cpp cut release b9691 on 2026-06-17. Release notes excerpt: "ggml-cpu: Conditionally enable power11 backend based on compiler support (#24687) * ggml: Conditionally enable power11 backend based on compiler support Guard POWER11 backend creation behind a compiler flag check...

RUNTIME1w ago

llama.cpp ships b9692

llama.cpp cut release b9692 on 2026-06-17. Release notes excerpt: "mtmd: llava_uhd should no longer use batch dim (#24732) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9692/llama-b9692-bin-macos-arm64.tar.gz) - macOS Appl...

RUNTIME1w ago

llama.cpp ships b9688

llama.cpp cut release b9688 on 2026-06-17. Release notes excerpt: "server: (router) add model management API (#23976) * wip * server: (router) add SSE realtime updates API * nits * wip * add download API * add download api * update docs * add delete endpoint * fix std::terminate...

RUNTIME1w ago

llama.cpp ships b9689

llama.cpp cut release b9689 on 2026-06-17. Release notes excerpt: "metal : add f16 and bf16 support for concat operator (#24724) * metal : add f16 and bf16 support for concat operator Extend the Metal backend concat operator to support f16 and bf16 tensor types in addition to the...

RUNTIME1w ago

llama.cpp ships b9690

llama.cpp cut release b9690 on 2026-06-17. Release notes excerpt: "metal : implement rope_back operator (#24725) Reuse existing rope kernels with a function constant to toggle forward/backward rotation, avoiding duplicate kernel code. Assisted-by: pi:llama.cpp/Qwen3.6-27B **macOS...

RUNTIME1w ago

llama.cpp ships b9678

llama.cpp cut release b9678 on 2026-06-17. Release notes excerpt: "opencl: optimize mul_mat_f16_f32_l4 for decode (#24504) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9678/llama-b9678-bin-macos-arm64.tar.gz) - macOS Appl...

RUNTIME1w ago

llama.cpp ships b9680

llama.cpp cut release b9680 on 2026-06-17. Release notes excerpt: "ci: fix vulkan docker images (#24595) * Update vulkan-shaders-gen.cpp * Update vulkan-shaders-gen.cpp add comment describing code change intention * Update vulkan-shaders-gen.cpp fix potential UB **macOS/iOS:** -...

RUNTIME1w ago

llama.cpp ships b9682

llama.cpp cut release b9682 on 2026-06-17. Release notes excerpt: "vulkan: record actual memory properties during buffer creation (#24326) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9682/llama-b9682-bin-macos-arm64.tar....

RUNTIME1w ago

llama.cpp ships b9674

llama.cpp cut release b9674 on 2026-06-17. Release notes excerpt: "SYCL: fix use-after-free bug with async memcpy in MoE prefill (#24676) * SYCL: fix a bug with async memcpy * make mmid_row_mapping_host persistent * comment on stream->wait * Apply suggestion from @sanmai * App...

RUNTIME1w ago

llama.cpp ships b9673

llama.cpp cut release b9673 on 2026-06-17. Release notes excerpt: "sycl: Add optional USM system allocations (#22526) This introduces an optional feature to allocate large GPU buffers (≥ 1GB) using USM system allocations if supported by the device. It allows using buffers from th...

RUNTIME1w ago

ollama ships v0.30.9

ollama cut release v0.30.9 on 2026-06-15. Release notes excerpt: "## What's Changed * Support for Cohere2Moe architecture * Fixed LFM2 parser/render for cases where thinking was not emitted * Fixed issue where `ollama launch claude` and other coding agent or assistant use cases w...

RUNTIME1w ago

llama.cpp ships b9672

llama.cpp cut release b9672 on 2026-06-16. Release notes excerpt: "vendor : update BoringSSL to 0.20260616.0 (#24693) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9672/llama-b9672-bin-macos-arm64.tar.gz) - macOS Apple Sil...

RUNTIME1w ago

ComfyUI ships v0.25.0

ComfyUI cut release v0.25.0 on 2026-06-16. Release notes excerpt: "## What's Changed * chore(openapi): sync shared API contract from cloud@7c470f0 by @comfy-pr-bot in https://github.com/Comfy-Org/ComfyUI/pull/14174 * fix: Image grid bug fix (CORE-215) by @yousef-rafat in https://...

RUNTIME1w ago

llama.cpp ships b9669

llama.cpp cut release b9669 on 2026-06-16. Release notes excerpt: "spec: add backend sampling support for eagle3 (#24655) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9669/llama-b9669-bin-macos-arm64.tar.gz) - macOS Apple...

RUNTIME1w ago

llama.cpp ships b9670

llama.cpp cut release b9670 on 2026-06-16. Release notes excerpt: "Fix and restrict NVFP4 edge-cases in llama-graph (#24331) * Move post-GEMM MUL required for dequant b4 lora and bias add see https://github.com/ggml-org/llama.cpp/pull/23484 : 1. For lora, I would presume we want...

RUNTIME1w ago

llama.cpp ships b9668

llama.cpp cut release b9668 on 2026-06-16. Release notes excerpt: "vulkan: prefer host-visible memory buffers on UMA devices (#22930) * implement UMA host-visible memory * update based on 0cc4m's suggestion **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-or...

RUNTIME1w ago

llama.cpp ships b9663

llama.cpp cut release b9663 on 2026-06-16. Release notes excerpt: "[SYCL] Support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND (#24363) * support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND * fix conflict * rebase, support new UT case of repeat, concat **macOS/...

RUNTIME1w ago

llama.cpp ships b9664

llama.cpp cut release b9664 on 2026-06-16. Release notes excerpt: "sycl: support reordered Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (#24452) * sycl: support reordered Q4_K and Q5_K MoE MUL_MAT_ID Extend reordered-weight handling to fused MoE MUL_MAT_ID for Q4_K and Q5_K expert tensors and a...

RUNTIME1w ago

llama.cpp ships b9665

llama.cpp cut release b9665 on 2026-06-16. Release notes excerpt: "bench : add --offline (#24511) * bench : add --offline Signed-off-by: Adrien Gallouët * Add default Signed-off-by: Adrien Gallouët --------- Signed-off-by: Adrien Gallouët **macOS/iOS:** - [macOS Apple Silicon (ar...

RUNTIME1w ago

llama.cpp ships b9658

llama.cpp cut release b9658 on 2026-06-15. Release notes excerpt: "chat: include full unparsed prompt in debug (#24650) message on parse error **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9658/llama-b9658-bin-macos-arm64....

RUNTIME1w ago

llama.cpp ships b9659

llama.cpp cut release b9659 on 2026-06-15. Release notes excerpt: "mtmd: fix miscounting n_tokens (#24656) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9659/llama-b9659-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64...

RUNTIME1w ago

llama.cpp ships b9660

llama.cpp cut release b9660 on 2026-06-15. Release notes excerpt: "chat : fix LFM2 tool-call parsing double-escaping (#24667) * Add escape test cases * chat : fix LFM2 tool-call parsing double-escaping **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/lla...

RUNTIME1w ago

llama.cpp ships b9654

llama.cpp cut release b9654 on 2026-06-15. Release notes excerpt: "mtmd : add post-decode callback (#24645) Assisted-by: pi:llama.cpp/Qwen3.6-27B **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9654/llama-b9654-bin-macos-arm...

RUNTIME1w ago

llama.cpp ships b9655

llama.cpp cut release b9655 on 2026-06-15. Release notes excerpt: "chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes (#24653) * chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes * update erroneous case in...

RUNTIME1w ago

llama.cpp ships b9656

llama.cpp cut release b9656 on 2026-06-15. Release notes excerpt: "chat: harden peg-native tool call parsing (#24329) * chat: harden peg-native tool call parsing accept an optional leading type: function field in build_json_tools_flat_keys so openai style tool calls parse on temp...

RUNTIME1w ago

llama.cpp ships b9647

llama.cpp cut release b9647 on 2026-06-15. Release notes excerpt: "[SYCL] add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp (#24584) * add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp * update ops.md **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://gi...

RUNTIME1w ago

llama.cpp ships b9649

llama.cpp cut release b9649 on 2026-06-15. Release notes excerpt: "sycl : fix reorder function; add fp32/fp16 in build script (#24578) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9649/llama-b9649-bin-macos-arm64.tar.gz)...

RUNTIME1w ago

llama.cpp ships b9650

llama.cpp cut release b9650 on 2026-06-15. Release notes excerpt: "sycl: fix soft_max_f32 max reduction (#24451) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9650/llama-b9650-bin-macos-arm64.tar.gz) - macOS Apple Silicon...

RUNTIME1w ago

llama.cpp ships b9642

llama.cpp cut release b9642 on 2026-06-15. Release notes excerpt: "CUDA: only support F32/F16 for GGML_OP_REPEAT (#24533) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9642/llama-b9642-bin-macos-arm64.tar.gz) - macOS Apple...

RUNTIME1w ago

llama.cpp ships b9632

llama.cpp cut release b9632 on 2026-06-14. Release notes excerpt: "jinja : add count/d/e filter aliases (#24606) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9632/llama-b9632-bin-macos-arm64.tar.gz) - macOS Apple Silicon...

RUNTIME1w ago

llama.cpp ships b9637

llama.cpp cut release b9637 on 2026-06-14. Release notes excerpt: "chat: add dedicated Cohere2MoE (North Code) parser (#24615) * chat: add dedicated Cohere2MoE (North Code) parser * Some renames to make @CISC happy :> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://gith...

RUNTIME2w ago

llama.cpp ships b9631

llama.cpp cut release b9631 on 2026-06-14. Release notes excerpt: "cli : fix not copying preserved tokens (#24258) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9631/llama-b9631-bin-macos-arm64.tar.gz) - macOS Apple Silico...

RUNTIME2w ago

llama.cpp ships b9630

llama.cpp cut release b9630 on 2026-06-14. Release notes excerpt: "Add cohere2moe to llama-vocab for TINY_AYA (#24601) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9630/llama-b9630-bin-macos-arm64.tar.gz) - macOS Apple Si...

RUNTIME2w ago

llama.cpp ships b9627

llama.cpp cut release b9627 on 2026-06-13. Release notes excerpt: "ui : fix llama-ui-embed crash when no asset dir is given (#24597) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9627/llama-b9627-bin-macos-arm64.tar.gz) -...

RUNTIME2w ago

llama.cpp ships b9628

llama.cpp cut release b9628 on 2026-06-14. Release notes excerpt: "add sycl to check-release (#24583) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9628/llama-b9628-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, Kle...

RUNTIME2w ago

llama.cpp ships b9624

llama.cpp cut release b9624 on 2026-06-13. Release notes excerpt: "ui: build-time gzip compression (#24571) * ui: keep original file name and path * fix nocache * ui: build-time gzip compression **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/...

RUNTIME2w ago

llama.cpp ships b9625

llama.cpp cut release b9625 on 2026-06-13. Release notes excerpt: "jinja : fix negative step slice with start/stop values (#24580) **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9625/llama-b9625-bin-macos-arm64.tar.gz) - ma...

RUNTIME2w ago

llama.cpp ships b9626

llama.cpp cut release b9626 on 2026-06-13. Release notes excerpt: "Add arch support for cohere2-MoE (#24260) * Add arch support for cohere2-MoE * Removed redundant gating_func checks * Changed ffn lookup to prefer prefix_dense_intermediate_size * Renamed arch to cohere2moe * Remo...