gemma

11.8B parameters

Commercial OK

Multimodal

Reviewed May 2026

Trendyol LLM Asure 12B

Trendyol LLM Asure 12B is a Gemma 3 based multimodal instruct model for Turkish and English business workflows. The public Ollama build used in local testing is the alibayram GGUF distribution.

License: Gemma·Context: 131,072 tokens

Overview

Trendyol LLM Asure 12B is a Gemma 3 based multimodal instruct model for Turkish and English business workflows. The public Ollama build used in local testing is the alibayram GGUF distribution.

How to run it

The locally tested route is Ollama with the alibayram/Trendyol-LLM-Asure-12B:latest tag, which points at the Q4_K_M GGUF mirror. On a 16GB RTX 5080 it loads comfortably for text-only chat and TurkishMMLU-style evaluation; keep num_ctx explicit because Ollama defaults can silently truncate 5-shot benchmark prompts.

Hardware guidance

The Q4_K_M GGUF is 7.3GB on disk. Plan for roughly 10GB+ of VRAM for normal chat and more headroom as context grows. The 131K advertised context is useful for long inputs, but high-context serving should be profiled because KV cache, batch size, and image inputs can dominate memory.

What breaks first

The first failure mode is context truncation: use a fixed num_ctx for benchmark runs. The second is over-reading the model as a general world-knowledge system; its own card says world knowledge is intentionally limited. Vision capability is part of the base model, but this TurkishMMLU run is text-only.

Runtime recommendation

Use Ollama for quick local text runs and llama.cpp or vLLM when you need tighter control over context, batching, or production serving. For reproducible quality runs, pin runtime version, quant, hardware, num_ctx, and publish the raw log.

Common beginner mistakes

Do not benchmark the model with Ollama's default 2048 context. Do not compare this Q4_K_M local run to BF16 vendor claims without labeling the quant. Do not treat the multimodal claim as measured by TurkishMMLU; this benchmark covers text-only Turkish multiple-choice reasoning.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (gemma-3)

Trendyol LLM Asure 12B11.8B

You are here

Strengths

Strong domestic business-workflow positioning
Gemma 3 multimodal lineage with Turkish and English coverage
Multiple local RTX 5080 measurements are now available

Weaknesses

The benchmarked Ollama summary reports quantization as unknown
12B class is slower than compact 2B-9B Turkish models
Vision quality needs a separate multimodal benchmark, not a text TPS row

Prompting kit

✓ Tested by runlocalai

on 2026-05-27· rtx-5080

Tested patterns for getting the most out of Trendyol LLM Asure 12B locally. Local models are pickier about prompt structure than cloud models — what works on Claude or GPT-5 often fails here.

Quirks to know

•Gemma-style <start_of_turn>/<end_of_turn> chat template
•Pass num_ctx explicitly for benchmark prompts
•The model is tuned for concise business-task responses, not broad trivia

Chat template

Gemma 3

Ollama injects the system prompt into the first user turn and uses Gemma turn markers.

Tool calling

✗ Not supported

No native tool-calling format was advertised or tested for this local benchmark.

Sampler settings

temperature: 0
top_p: 1

Quality benchmarks use deterministic generation with max_tokens=8 and letter parsing.

Browse prompting kits for every model →/prompting

BLK · QUALITY BENCHMARKreviewed · raw logs

Reviewed quality benchmarks

First-party rows were run by RunLocalAI; reviewed community rows are labeled in the data. Every row links to the raw test-run log.

Benchmark	Quant	Runtime / Hardware	Score	Raw log
MBPP+ tested 2026-05-27	Q4_K_M	ollama-0.24.0 rtx-5080	71.7/100	Gist →
HumanEval+ tested 2026-05-27	Q4_K_M	ollama-0.24.0 rtx-5080	69.5/100	Gist →
TurkishMMLU (Generative) tested 2026-05-27	Q4_K_M	ollama-0.24.0 rtx-5080	58.9/100	Gist →

Q4_K_M note:First-party measured MBPP+ run. Generation used Ollama's OpenAI-compatible chat endpoint at temperature 0 and num_ctx 8192. Scoring used official EvalPlus 0.3.1 under WSL; public Gist includes metadata, generation log, official scorer log, sanitized samples, and raw model completions.

Q4_K_M note:First-party measured HumanEval+ run. Generation used Ollama's OpenAI-compatible chat endpoint at temperature 0 and num_ctx 8192. Scoring used official EvalPlus 0.3.1 under WSL; public Gist includes metadata, generation log, official scorer log, sanitized samples, and raw model completions.

Q4_K_M note:First-party text-only TurkishMMLU generative run on local Ollama tag alibayram/Trendyol-LLM-Asure-12B:latest. Source model card: alibayram/Trendyol-LLM-Asure-12B; local GGUF source: alibayram/Trendyol-LLM-Asure-12B-Q4_K_M-GGUF. Hardware: RTX 5080 16GB, NVIDIA driver 595.97.

Want to verify? Every row links to its Gist with full stdout and stderr of the run. The runner script is in the public repo (scripts/run-humaneval-plus.ts) — reproducible end-to-end. Browse all coding scores at /benchmarks/coding.

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
GGUF_UNKNOWN	7.3 GB	10 GB

Get the model

Ollama

One-line install

ollama run alibayram/Trendyol-LLM-Asure-12B:latestRead our Ollama review →

HuggingFace

Original weights

huggingface.co/Trendyol/Trendyol-LLM-Asure-12B

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

4 runs on record

Hardware	Provenance	Quant	Ctx	Tokens / sec	TTFT	Date
NVIDIA GeForce RTX 5080	EditorialM	Q4_K_M	4K	82.0tok/s	136 ms	May 28, 26
NVIDIA GeForce RTX 5080	EditorialM	unknown	2K	79.1tok/s	—	May 28, 26
NVIDIA GeForce RTX 5080	EditorialM	Q4_K_M	8K	61.5tok/s	323 ms	May 27, 26
NVIDIA GeForce RTX 3080 16GB (Mobile)	EditorialM	Q4_K_M	4K	43.4tok/s	391 ms	Jun 2, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Submit a benchmark for Trendyol LLM Asure 12B

OrBrowse the benchmark roadmap Compare hardware options

Hardware that runs this

Cards with enough VRAM for at least one quantization of Trendyol LLM Asure 12B.

NVIDIA B300 (Blackwell Ultra)

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run Trendyol LLM Asure 12B?

10GB of VRAM is enough to run Trendyol LLM Asure 12B at the GGUF_UNKNOWN quantization (file size 7.3 GB). Higher-quality quantizations need more.

Can I use Trendyol LLM Asure 12B commercially?

Yes — Trendyol LLM Asure 12B ships under the Gemma, which permits commercial use. Always read the license text before deployment.

What's the context length of Trendyol LLM Asure 12B?

Trendyol LLM Asure 12B supports a context window of 131,072 tokens (about 131K).

How do I install Trendyol LLM Asure 12B with Ollama?

Run `ollama pull alibayram/Trendyol-LLM-Asure-12B:latest` to download, then `ollama run alibayram/Trendyol-LLM-Asure-12B:latest` to start a chat session. The default quantization is Q4_K_M.

Does Trendyol LLM Asure 12B support images?

Yes — Trendyol LLM Asure 12B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/Trendyol/Trendyol-LLM-Asure-12B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Alternatives

Gemma 3 1B Gemma 3 27B Gemma 3 12B Gemma 3 4B

Before you buy

Verify Trendyol LLM Asure 12B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →