other
32B parameters
Commercial OK
Reviewed May 2026

Pollux Judge 32B

A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a written rationale. Built on T-pro-it-1.0 and trained entirely on synthetic POLLUX dataset data.

License: MIT·Context: 4,096 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.3/10

If you're running Russian-language model evals and need a local, auditable judge, this is a credible option — MIT license, structured output, no API dependency. The single-criterion-per-run constraint is a real workflow cost though: evaluating on multiple axes means multiple forward passes. At 32B it's not cheap to run, and 1010 downloads with 5 likes suggests limited community validation so far. Hedge: worth testing if you have a clear rubric workflow; skip if you need flexible or multi-dimensional scoring out of the box.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.25/10. License is explicit MIT on the HF card and commercial use is correctly flagged. Metadata aligns with the card: 32B params, Russian, finetuned from T-pro-it-1.0 (Qwen2 family — 'other' is acceptable since it's a judge derivative). Description and verdict are honest, operator-voiced, and call out real constraints (single-criterion-per-run, synthetic training data, 4096 ctx tightness, weak community signal). Use case is sharply scoped to Russian LLM eval pipelines, which is a narrow but legitimate niche for local-first ops teams. Minor concern: context length of 4096 isn't directly verified in the excerpt shown but is plausible for a T-pro-it-1.0 derivative — worth a second check.

Flags: - contextLength 4096 not explicitly confirmed in README excerpt — verify against base model T-pro-it-1.0 config - Niche brand fit: Russian-only judge has limited audience for runlocalai's primarily English-speaking operator base

Overview

A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a written rationale. Built on T-pro-it-1.0 and trained entirely on synthetic POLLUX dataset data.

Strengths

  • MIT license, commercial use permitted
  • Returns both a numeric score and a text rationale in one pass
  • Structured rubric input keeps scoring consistent across runs
  • 32B scale gives it headroom for nuanced Russian-language judgment

Weaknesses

  • One criterion per run only — multi-criteria evaluation is unsupported and results are unpredictable
  • You must supply explicit criteria; the model will not choose its own
  • Trained entirely on synthetic data, which may not reflect messy real-world responses
  • 4096-token context is tight for long response evaluation tasks

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M17.6 GB23 GB

Get the model

HuggingFace

Original weights

huggingface.co/ai-forever/pollux-judge-32b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Pollux Judge 32B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Pollux Judge 32B?

23GB of VRAM is enough to run Pollux Judge 32B at the Q4_K_M quantization (file size 17.6 GB). Higher-quality quantizations need more.

Can I use Pollux Judge 32B commercially?

Yes — Pollux Judge 32B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Pollux Judge 32B?

Pollux Judge 32B supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/ai-forever/pollux-judge-32b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Pollux Judge 32B runs on your specific hardware before committing money.