Text-to-3D Generation

Generating 3D models from text prompts. Hunyuan3D-2, TRELLIS, Stable Fast 3D lead open-weight in 2026.

Setup walkthrough

pip install gradio + git clone https://github.com/Tencent/Hunyuan3D-2 (Hunyuan3D-2 — SOTA open-weight text-to-3D).
Download the model weights (~5 GB for the base model, ~10 GB for the full pipeline with texture generation).
The pipeline: text prompt → multi-view diffusion (generates 6 views of the object) → 3D reconstruction (creates mesh from views) → texture generation (UV-unwraps and textures the mesh).
CLI: python inference.py --prompt "a wooden chair with carved armrests" --output chair.glb
First 3D model in 2-10 minutes on 12+ GB GPU. Output is a textured GLB file (standard format, opens in Blender, Unity, Unreal).
For lighter/faster: pip install shap-e (OpenAI Shap-E, ~1 GB) — generates simple 3D shapes from text in 10-30 seconds on CPU. Lower quality, much faster.
Alternative: TripoSR (pip install triposr) — image-to-3D, but can be used with text via text→image→3D pipeline.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs Hunyuan3D-2 at 5-15 minutes per model. Shap-E runs at 30-60 seconds per model on CPU. For $400: you can generate simple 3D assets (furniture, props, basic characters) for game dev and prototyping. For high-quality textured models: Hunyuan3D-2 on 12 GB works but the multi-view diffusion stage strains VRAM — expect occasional OOM errors on complex prompts. Text-to-3D at $400 works for prototyping; production-quality models need more VRAM or cloud services.

The serious setup

Used RTX 3090 24 GB ($700-900, see /hardware/rtx-3090). Runs Hunyuan3D-2 comfortably at 2-5 minutes per model — the full pipeline (multi-view + mesh + texture) fits in 24 GB. For a game asset pipeline generating 20-50 props/day, one RTX 3090 handles it. For high-quality character models: 24 GB enables the highest resolution multi-view diffusion. Total: ~$1,800-2,200. RTX 4090 24 GB ($1,600) drops generation to 1-3 minutes per model — fast enough for interactive prototyping. Text-to-3D is a "generate, review, refine" loop — faster GPU = faster iteration.

Common beginner mistake

The mistake: Generating a 3D model from text, importing it into a game engine or 3D printer slicer, and wondering why it has 500K triangles, inverted normals, and non-manifold geometry. Why it fails: AI-generated meshes prioritize visual appearance over geometric correctness. The mesh looks right from the generated views but has topological issues: non-manifold edges, self-intersecting faces, inconsistent normals, and absurd triangle counts (a simple chair shouldn't have 500K tris). The fix: Always post-process AI-generated meshes. Import into Blender → Decimate modifier (reduce to 5-10K tris for game assets) → Recalculate Normals → 3D Print Toolbox (check for non-manifold geometry) → manual cleanup. AI generates the rough shape; you optimize for the target platform. A raw AI mesh is a starting point, not a deliverable. Budget 10-30 minutes of manual cleanup per AI-generated model.

Recommended setup for text-to-3d generation

Recommended hardware

Best GPU for local AI →

All workloads ranked across VRAM tiers.

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Budget build

AI PC under $1,000 →

Best GPU for this task

Best GPU for local AI →

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

Buying for spec-sheet VRAM without modeling KV cache + activation overhead
Underestimating quantization quality loss below Q4
Skipping flash-attention support (real perf gap on long context)
Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running text-to-3d generation locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle text-to-3d generation before committing money.

Buyer guides

Compare hardware

Troubleshooting

Specialized buyer guides

Updated 2026 roundup

Best free local AI tools (2026) →