KEY INSIGHT
Your MVP needs to prove that users want the outcome your AI delivers, not that you can build AI. Many operators confuse "MVP" with "minimum interesting demo." Ship something users will actually pay for, even if it's ugly.
MVP architecture for local AI products follows a three-layer stack:
1. **Interface layer** (web app, WhatsApp bot, USSD menu): Minimal viable UX
2. **Inference layer** (local model or API): Enough capability for core use case
3. **Data layer** (user data, interactions, feedback): Capture everything
The temptation is to build a beautiful interface with weak AI inside. Don't. Users forgive ugly interfaces when the AI works. They don't forgive beautiful interfaces that produce wrong outputs.
```python
# MVP architecture: Local model serving with basic web interface
# This is a minimal but functional structure
"""
MVP Stack:
- Backend: FastAPI (lightweight Python web framework)
- Model: Llama-based model via Ollama or llama.cpp
- Interface: Basic HTML/JS or Telegram bot
- Data: SQLite for MVP (upgrade to Postgres when you have traction)
"""
# mvp/app.py - Core MVP structure
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import sqlite3
import ollama
app = FastAPI()
class UserRequest(BaseModel):
user_id: str
query: str
context: Optional[str] = ""
class UserResponse(BaseModel):
response: str
tokens_used: int
latency_ms: int
def init_db():
"""Initialize SQLite database for MVP data capture"""
conn = sqlite3.connect('mvp_data.db')
c = conn.cursor()
c.execute('''
CREATE TABLE IF NOT EXISTS interactions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id TEXT,
query TEXT,
response TEXT,
tokens_used INTEGER,
latency_ms INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
conn.commit()
return conn
db_conn = init_db()
@app.post("/api/query", response_model=UserResponse)
async def process_query(request: UserRequest):
"""Core MVP endpoint: receive query, return AI response"""
import time
start = time.time()
try:
# Call local model
full_prompt = f"Context: {request.context}\n\nQuery: {request.query}"
response = ollama.generate(
model='llama3.2:3b', # Small model for MVP speed
prompt=full_prompt,
options={'num_predict': 512} # Limit output tokens
)
latency = int((time.time() - start) * 1000)
# Log interaction
c = db_conn.cursor()
c.execute('''
INSERT INTO interactions
(user_id, query, response, tokens_used, latency_ms)
VALUES (?, ?, ?, ?, ?)
''', (request.user_id, request.query, response['response'],
response['eval_count'], latency))
db_conn.commit()
return UserResponse(
response=response['response'],
tokens_used=response['eval_count'],
latency_ms=latency
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Run with: uvicorn mvp.app:app --host 0.0.0.0 --port 8000
```
The MVP should answer three questions within 4-6 weeks of launch:
1. Do users actually have this problem? (Engagement data)
2. Does our solution address it? (Outcome metrics)
3. Will they pay? (Conversion to paid tier)
If you can't answer these questions by week six, the MVP isn't minimal enough—you're building features before validating.