RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Notable models & companies / GPT-4
Notable models & companies

GPT-4

GPT-4 is a large multimodal language model developed by OpenAI, released in March 2023. It accepts text and image inputs and produces text outputs. For operators, GPT-4 is a proprietary, closed-weight model accessible only via OpenAI's API or ChatGPT Plus subscription. It is not available for local download or self-hosting, unlike open-weight models such as Llama 3.1 or Mistral. The model is estimated to have 1.7 trillion parameters and uses a mixture-of-experts (MoE) architecture, making it far too large to run on consumer hardware even with quantization. Operators encounter GPT-4 when comparing its API-based performance against locally runnable models for tasks like coding, reasoning, or creative writing.

Deeper dive

GPT-4 represents a significant leap over GPT-3.5 in reasoning, factual accuracy, and steerability. It is multimodal, meaning it can process images (e.g., diagrams, screenshots) and text. The model is built on a transformer architecture with a mixture-of-experts (MoE) design, where only a subset of parameters (around 280 billion) are active per token, reducing inference cost. OpenAI has not released official parameter counts, but leaks and analyses suggest 1.7 trillion total parameters with 8 experts. GPT-4 is available in several variants: GPT-4 (base), GPT-4 Turbo (cheaper, faster, with knowledge cutoff April 2023), and GPT-4o (omni, multimodal, faster). For operators, the key takeaway is that GPT-4 is a closed, API-only model. Its performance sets a benchmark for local models, but its cost, latency, and lack of privacy are trade-offs. Local models like Llama 3.1 70B or Mixtral 8x22B approach GPT-4 quality on many tasks while running on a single high-end GPU (e.g., 48 GB VRAM).

Practical example

An operator comparing GPT-4 to a local model might run a coding benchmark: GPT-4 Turbo via API costs $0.01 per 1K input tokens and ~$0.03 per 1K output tokens. A 1,000-token prompt with 500-token response costs $0.025. In contrast, running Llama 3.1 70B Q4 locally on an RTX 6000 Ada (48 GB VRAM) costs only electricity ($0.10/hour) and yields ~15 tok/s. For a batch of 100 such queries, GPT-4 costs $2.50, while local inference costs ~$0.10 in electricity but requires upfront hardware investment.

Workflow example

In practice, an operator might use GPT-4 via the OpenAI API in a script: import openai; response = openai.ChatCompletion.create(model="gpt-4", messages=[{"role": "user", "content": "Explain quantum computing"}]). The response includes token usage (usage.total_tokens). For local alternatives, the operator would use Ollama: ollama run llama3.1:70b and measure tokens/sec via the --verbose flag. The choice between GPT-4 and local models hinges on budget, latency tolerance, and data privacy requirements.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →