What this does

LangChain document loaders provide a unified interface for ingesting data from dozens of sources into Document objects that the rest of a RAG pipeline consumes. Each loader returns Document instances wrapping page content and optional metadata.

Steps

Install extras for your specific loader.

pip install langchain-community pypdf beautifulsoup4

Import the loader and load documents.

from langchain_community.document_loaders import PyPDFLoader
from langchain_core.documents import Document

loader = PyPDFLoader("/data/report.pdf")
pages = loader.load()
for page in pages:
    print(f"Page {page.metadata.get('page')}: {len(page.page_content)} chars")

Load from a text file or directory.

from langchain_community.document_loaders import TextLoader, DirectoryLoader

txt_loader = TextLoader("/data/notes.txt")
txt_docs = txt_loader.load()

dir_loader = DirectoryLoader("/data/docs", glob="**/*.txt", loader_cls=TextLoader)
dir_docs = dir_loader.load()

Load from a URL with BSHTMLLoader.

from langchain_community.document_loaders import BSHTMLLoader

html_loader = BSHTMLLoader("https://example.com/page.html")
html_docs = html_loader.load()

Verification

python -c "
from langchain_community.document_loaders import TextLoader
loader = TextLoader('/etc/hostname')
docs = loader.load()
print(f'Loaded {len(docs)} document(s), {len(docs[0].page_content)} chars')
"
# Expected: Loaded 1 document(s), <N> chars

Common failures

ImportError for PyPDFLoader. Missing langchain-community or pypdf. Install with pip install langchain-community pypdf.
FileNotFoundError. Verify the path exists with ls -la /data/report.pdf.
Empty page_content on all pages. Encrypted PDF or image-only scan. Run pdftotext to confirm text extraction works first.
AttributeError on metadata. Metadata field missing for some pages. Use .get("page", 0) with a fallback default.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to Load Documents with LangChain Document Loaders

What this does

Steps

Verification

Common failures

Operator checkpoint

Related guides