18. Research System Project

Chapter 18 of 18 · 15 min

KEY INSIGHT

Building a research system synthesizes everything in this course: problem definition, system design, implementation, evaluation, and communication. The process reveals gaps that isolated exercises cannot. This final chapter provides a structured project that applies the course material holistically. The project scope is deliberately bounded—sufficient for demonstration, not publication. ### Project Specification **Objective**: Build a research system that answers questions using a retrieval-augmented approach over a domain-specific corpus. **Core Components**: ```python # project_architecture.py """ research_system/ ├── src/ │ ├── __init__.py │ ├── retrieval/ # Document retrieval module │ │ ├── __init__.py │ │ ├── indexer.py # Build document index │ │ └── searcher.py # Query index │ ├── generation/ # Answer synthesis module │ │ ├── __init__.py │ │ └── synthesizer.py # Combine retrieved context │ └── evaluation/ # Assessment module │ ├── __init__.py │ └── metrics.py # Accuracy, latency, coverage ├── tests/ │ ├── test_retrieval.py │ ├── test_generation.py │ └── test_integration.py ├── docs/ │ ├── README.md │ ├── architecture.md │ └── evaluation.md ├── scripts/ │ ├── index_corpus.py │ └── run_benchmark.py ├── data/ │ └── sample_corpus/ # Domain-specific data └── requirements.txt """ # Key interfaces class DocumentIndex: def build(self, documents: list[Document]) -> None: """Build index from documents.""" ... def search(self, query: str, top_k: int) -> list[tuple[Document, float]]: """Search index for relevant documents.""" ... class AnswerSynthesizer: def __init__(self, model_path: str): """Initialize with specified model.""" ... def generate(self, question: str, context: list[Document]) -> str: """Generate answer given question and context.""" ... class EvaluationSuite: def run(self, system: ResearchSystem, test_set: TestCase) -> EvaluationResult: """Run full evaluation.""" ... ``` ### Requirements 1. **Retrieval**: Index a corpus of at least 1,000 documents and retrieve relevant documents for arbitrary queries with >70% precision at top-5 2. **Generation**: Generate coherent answers that incorporate retrieved context; no hallucinated facts not supported by context 3. **Evaluation**: Produce quantitative metrics including accuracy, latency, and retrieval precision; compare against a simple baseline (e.g., TF-IDF retrieval) 4. **Documentation**: README with installation, usage, and architecture description; inline documentation for all public interfaces 5. **Benchmarking**: Measure performance across at least 100 queries; report latency distribution and accuracy metrics ### Evaluation Criteria | Component | Criteria | Weight | |-----------|----------|--------| | Retrieval | Accuracy, relevance quality | 25% | | Generation | Answer quality, faithfulness to context | 25% | | Code Quality | Structure, documentation, tests | 20% | | Evaluation | Rigorous benchmarking, statistical reporting | 15% | | Communication | README clarity, presentation | 15% | ### Common Pitfalls **Over-engineering the index**: Start simple. A working TF-IDF baseline with 60% accuracy is better than a broken dense retriever with theoretical 90% accuracy. **Skipping the baseline**: Without comparison, results are uninterpretable. Always have a simple baseline to beat. **Ignoring latency**: A system that works but takes 30 seconds per query won't be used. Measure and optimize. **Undocumented limitations**: Be explicit about what your system cannot do. This is not weakness—it's honest engineering.

EXERCISE

Complete the research system project following the specification. Document every decision: why this retrieval approach, why this model, what the error analysis revealed. Present the final system including live demo, benchmark results, and honest discussion of limitations.