by Google DeepMind
Google's closed-weight family — Gemini Pro, Gemini Ultra, Gemini Flash. Open-weight derivatives ship as [Gemma](/families/gemma). Reference baseline for capability comparisons.
Gemini is Google's closed-weight family — there are no open-weight Gemini models for local deployment. Google's open-weight alternative is Gemma 3 12B (Gemini-distilled, Apache 2.0-like license), which runs on RTX 3060 12GB at Q4 (~7 GB). Gemma 3 captures ~50-60% of Gemini 2.0 Flash quality on benchmarks at ~1/200th the parameter count — it is the pragmatic local entry point for Gemini-style capability. For true Gemini-level performance: DeepSeek V4 at FP8 on 8× H100 SXM — matches Gemini 2.5 Pro on math and code with open weights. Gemini's native multimodality (text + image + audio + video input in a single model) has no open-weight equivalent — InternVL2 handles text+image, Whisper handles audio, and Wan handles video, but no single open-weight model processes all modalities natively. Gemini 2.5 Flash is the cost-efficient API entry point for most use cases ($0.15/1M input, $0.60/1M output with 1M token context).
Gemini is API-only — no self-hosted deployment. Google AI Studio / Vertex AI endpoints: Gemini 2.5 Pro ($1.25/1M input, $10/1M output up to 200K), Gemini 2.5 Flash ($0.15/1M input, $0.60/1M output), Gemini 2.0 Flash-Lite ($0.075/1M input, $0.30/1M output). For self-hosted Gemini-alternative multimodality: InternVL2 76B Q4 on 2× H100 SXM for vision-language, faster-whisper large-v3 on L4 for audio, and ComfyUI + Wan 2.1 on RTX 4090 for video — each modality is a separate model. For mobile/edge: Gemma 3 4B via MediaPipe LLM Inference on Tensor G4 — runs entirely on-device, 1M context via KV-cache compression. For Vertex AI enterprise: Gemini 2.5 Pro with grounding (Google Search) and enterprise data residency — latency 300-800ms, Google Cloud SLA. For local RAG at Gemini quality: BGE-M3 retrieval + Llama 3.3 70B generation via vLLM on 2× H100 SXM.
Verify Gemini (Google) runs on your specific hardware before committing money.