RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Load Documents with LangChain Document Loaders
HOW-TO · RAG

How to Load Documents with LangChain Document Loaders

intermediate·15 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

LangChain installed, source documents

What this does

LangChain document loaders provide a unified interface for ingesting data from dozens of sources into Document objects that the rest of a RAG pipeline consumes. Each loader returns Document instances wrapping page content and optional metadata.

Steps

  1. Install extras for your specific loader.

    pip install langchain-community pypdf beautifulsoup4
    
  2. Import the loader and load documents.

    from langchain_community.document_loaders import PyPDFLoader
    from langchain_core.documents import Document
    
    loader = PyPDFLoader("/data/report.pdf")
    pages = loader.load()
    for page in pages:
        print(f"Page {page.metadata.get('page')}: {len(page.page_content)} chars")
    
  3. Load from a text file or directory.

    from langchain_community.document_loaders import TextLoader, DirectoryLoader
    
    txt_loader = TextLoader("/data/notes.txt")
    txt_docs = txt_loader.load()
    
    dir_loader = DirectoryLoader("/data/docs", glob="**/*.txt", loader_cls=TextLoader)
    dir_docs = dir_loader.load()
    
  4. Load from a URL with BSHTMLLoader.

    from langchain_community.document_loaders import BSHTMLLoader
    
    html_loader = BSHTMLLoader("https://example.com/page.html")
    html_docs = html_loader.load()
    

Verification

python -c "
from langchain_community.document_loaders import TextLoader
loader = TextLoader('/etc/hostname')
docs = loader.load()
print(f'Loaded {len(docs)} document(s), {len(docs[0].page_content)} chars')
"
# Expected: Loaded 1 document(s), <N> chars

Common failures

  • ImportError for PyPDFLoader. Missing langchain-community or pypdf. Install with pip install langchain-community pypdf.
  • FileNotFoundError. Verify the path exists with ls -la /data/report.pdf.
  • Empty page_content on all pages. Encrypted PDF or image-only scan. Run pdftotext to confirm text extraction works first.
  • AttributeError on metadata. Metadata field missing for some pages. Use .get("page", 0) with a fallback default.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Related guides

  • batch-ingest-documents-vector-database
  • extract-metadata-documents-filtering
RELATED GUIDES
RAG
How to Batch Ingest Documents into Vector Database
RAG
How to Extract Metadata from Documents for Filtering
← All how-to guidesCourses →