Gemini (Google)

Gemini is a family of multimodal large language models (LLMs) developed by Google DeepMind, designed to process text, images, audio, video, and code. For local AI operators, Gemini itself is not available for local deployment—it is accessed via Google's cloud API (gemini.google.com or Vertex AI). The term matters because Gemini represents a closed-source alternative to open-weight models like Llama or Mistral; operators cannot download or run Gemini on their own hardware. Google offers several model sizes: Gemini Ultra (largest), Gemini Pro (mid-size), and Gemini Nano (on-device, e.g., Pixel phones). Nano is the only variant that runs locally, but only on specific Google devices, not on consumer GPUs or Apple Silicon.

An operator building a local RAG pipeline might consider Gemini for cloud-based embedding or generation, but cannot run it on an RTX 4090. Instead, they would use open models like Llama 3.1 8B or Mistral 7B. Gemini Nano runs on Pixel 8 Pro for on-device tasks like Smart Reply, but is not available for download on other hardware.

In a typical workflow, an operator testing Gemini would use the Google AI Studio web UI or the google-generativeai Python SDK. For example: import google.generativeai as genai; genai.configure(api_key='...'); model = genai.GenerativeModel('gemini-1.5-pro'); response = model.generate_content('Explain quantization.'). No local download or model file is involved.

Reviewed by Fredoline Eruo. See our editorial policy.

When it doesn't work

Practical example

Workflow example