Gemini (Google)

by Google DeepMind

Google's closed-weight family — Gemini Pro, Gemini Ultra, Gemini Flash. Open-weight derivatives ship as [Gemma](/families/gemma). Reference baseline for capability comparisons.

Best entry point for local use

Gemini is Google's closed-weight family — there are no open-weight Gemini models for local deployment. Google's open-weight alternative is Gemma 3 12B (Gemini-distilled, Apache 2.0-like license), which runs on RTX 3060 12GB at Q4 (~7 GB). Gemma 3 captures ~50-60% of Gemini 2.0 Flash quality on benchmarks at ~1/200th the parameter count — it is the pragmatic local entry point for Gemini-style capability. For true Gemini-level performance: DeepSeek V4 at FP8 on 8× H100 SXM — matches Gemini 2.5 Pro on math and code with open weights. Gemini's native multimodality (text + image + audio + video input in a single model) has no open-weight equivalent — InternVL2 handles text+image, Whisper handles audio, and Wan handles video, but no single open-weight model processes all modalities natively. Gemini 2.5 Flash is the cost-efficient API entry point for most use cases ($0.15/1M input, $0.60/1M output with 1M token context).

Deployment guidance

Gemini is API-only — no self-hosted deployment. Google AI Studio / Vertex AI endpoints: Gemini 2.5 Pro ($1.25/1M input, $10/1M output up to 200K), Gemini 2.5 Flash ($0.15/1M input, $0.60/1M output), Gemini 2.0 Flash-Lite ($0.075/1M input, $0.30/1M output). For self-hosted Gemini-alternative multimodality: InternVL2 76B Q4 on 2× H100 SXM for vision-language, faster-whisper large-v3 on L4 for audio, and ComfyUI + Wan 2.1 on RTX 4090 for video — each modality is a separate model. For mobile/edge: Gemma 3 4B via MediaPipe LLM Inference on Tensor G4 — runs entirely on-device, 1M context via KV-cache compression. For Vertex AI enterprise: Gemini 2.5 Pro with grounding (Google Search) and enterprise data residency — latency 300-800ms, Google Cloud SLA. For local RAG at Gemini quality: BGE-M3 retrieval + Llama 3.3 70B generation via vLLM on 2× H100 SXM.

Related families

Gemma Claude (Anthropic)GPT (OpenAI)

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Alternatives

Gemma Claude (Anthropic)GPT (OpenAI)

Before you buy

Verify Gemini (Google) runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →