other

32B parameters

Commercial OK

Reviewed May 2026

Pollux Judge 32B

A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a written rationale. Built on T-pro-it-1.0 and trained entirely on synthetic POLLUX dataset data.

License: MIT·Context: 4,096 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026

9.3/10

If you're running Russian-language model evals and need a local, auditable judge, this is a credible option — MIT license, structured output, no API dependency. The single-criterion-per-run constraint is a real workflow cost though: evaluating on multiple axes means multiple forward passes. At 32B it's not cheap to run, and 1010 downloads with 5 likes suggests limited community validation so far. Hedge: worth testing if you have a clear rubric workflow; skip if you need flexible or multi-dimensional scoring out of the box.

›Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.25/10. License is explicit MIT on the HF card and commercial use is correctly flagged. Metadata aligns with the card: 32B params, Russian, finetuned from T-pro-it-1.0 (Qwen2 family — 'other' is acceptable since it's a judge derivative). Description and verdict are honest, operator-voiced, and call out real constraints (single-criterion-per-run, synthetic training data, 4096 ctx tightness, weak community signal). Use case is sharply scoped to Russian LLM eval pipelines, which is a narrow but legitimate niche for local-first ops teams. Minor concern: context length of 4096 isn't directly verified in the excerpt shown but is plausible for a T-pro-it-1.0 derivative — worth a second check.

Flags: - contextLength 4096 not explicitly confirmed in README excerpt — verify against base model T-pro-it-1.0 config - Niche brand fit: Russian-only judge has limited audience for runlocalai's primarily English-speaking operator base

Overview

A 32B judge model built to score other LLMs' Russian-language outputs. Give it an instruction, a response, and a rubric — it returns a numeric score plus a written rationale. Built on T-pro-it-1.0 and trained entirely on synthetic POLLUX dataset data.

Strengths

MIT license, commercial use permitted
Returns both a numeric score and a text rationale in one pass
Structured rubric input keeps scoring consistent across runs
32B scale gives it headroom for nuanced Russian-language judgment

Weaknesses

One criterion per run only — multi-criteria evaluation is unsupported and results are unpredictable
You must supply explicit criteria; the model will not choose its own
Trained entirely on synthetic data, which may not reflect messy real-world responses
4096-token context is tight for long response evaluation tasks

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	17.6 GB	23 GB

Get the model

HuggingFace

Original weights

huggingface.co/ai-forever/pollux-judge-32b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Pollux Judge 32B.

NVIDIA GB200 NVL72

13824GB · nvidia

AMD Instinct MI350X

NVIDIA B300 (Blackwell Ultra)

288GB · nvidia

AMD Instinct MI355X

AMD Instinct MI325X

AMD Instinct MI300X

192GB · nvidia

NVIDIA H100 NVL

188GB · nvidia

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run Pollux Judge 32B?

23GB of VRAM is enough to run Pollux Judge 32B at the Q4_K_M quantization (file size 17.6 GB). Higher-quality quantizations need more.

Can I use Pollux Judge 32B commercially?

Yes — Pollux Judge 32B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Pollux Judge 32B?

Pollux Judge 32B supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/ai-forever/pollux-judge-32b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify Pollux Judge 32B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →