Notable models & companies

Grok (xAI)

Grok is a family of large language models (LLMs) developed by xAI, led by Elon Musk. The first version, Grok-1, was released in November 2023 as a 314-billion-parameter Mixture-of-Experts (MoE) model with real-time knowledge from the X (formerly Twitter) platform. Grok-2 and Grok-2 mini followed in August 2024, offering improved reasoning and coding capabilities. For operators running local AI, Grok models are not directly available for download or inference due to proprietary licensing; they are only accessible via xAI's paid subscription service (X Premium+). This contrasts with open-weight models like Llama or Mistral that can be run on consumer hardware.

Deeper dive

Grok's architecture is based on a Mixture-of-Experts (MoE) design, where only a subset of parameters is activated per token, reducing inference cost. Grok-1 had 314B total parameters with ~86B active per token. The model was trained on a custom cluster of thousands of GPUs and uses a custom tokenizer. Grok's distinguishing feature is its access to real-time data from X, enabling it to answer questions about recent events. However, this also means the model is tightly coupled to xAI's infrastructure. Unlike open-source models, Grok cannot be quantized, fine-tuned, or run locally by operators. The only way to interact with it is through the X platform's chat interface, which imposes usage limits and requires a subscription. This makes Grok irrelevant for local AI workflows, but it is a notable competitor in the commercial LLM space.

Practical example

An operator with an RTX 4090 cannot download or run Grok-2 locally—there are no model weights available for download. In contrast, they can run Llama 3.1 70B at Q4 (40 GB) with offloading, or Mistral Large 2 123B at Q3 (45 GB) with multiple GPUs. Grok remains a cloud-only service, accessible only via the X Premium+ subscription ($16/month).

Workflow example

When an operator wants to test Grok, they must visit x.ai or the X platform, log in with a Premium+ subscription, and use the chat interface. There is no API for local integration, no Ollama model, and no Hugging Face repository. The workflow is entirely web-based, with no local inference, quantization, or fine-tuning possible.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work