04. Relation Extraction
Relation extraction identifies semantic connections between entity pairs within text. While entity recognition locates individual elements, relation extraction captures how those elements interact—determining whether "Apple" refers to a fruit or a technology company, or whether a purchase date connects to a specific product.
Predicate-based extraction uses prompt templates that specify relation types explicitly. Common relation taxonomies include LOCATED_IN, WORKS_FOR, ACQUIRED_BY, MARRIED_TO, and FOUNDED_BY. Output formats range from simple tuples to nested JSON structures capturing relation strength and temporal qualifiers.
import ollama
def extract_relations(text, model="llama3"):
prompt = f"""Identify all relationships between entities in the text.
Format as: [Entity1] - [Relation] - [Entity2]
Possible relations: WORKS_FOR, LOCATED_IN, FOUNDED_BY, ACQUIRED_BY
Text: {text}
Relations:"""
result = ollama.generate(model=model, prompt=prompt)
return parse_relations(result['response'])
def parse_relations(response):
relations = []
for line in response.strip().split('\n'):
if ' - ' in line:
parts = [p.strip() for p in line.split(' - ')]
if len(parts) == 3:
relations.append({
'entity1': parts[0],
'relation': parts[1],
'entity2': parts[2]
})
return relations
Relation extraction challenges include bidirectional relationships where types invert, multi-hop relations spanning multiple sentences, and implicit relations not explicitly stated in text. Implicit relations require inference capabilities beyond surface-level pattern matching—determining that "Company X reduced its workforce" implies a HAS_EMPLOYEES relation that has decreased.
Graph construction applications benefit from relation extraction. Structured knowledge graphs store entities as nodes and relations as typed edges, enabling downstream reasoning queries. Local LLMs can serve relation extraction backends that populate Neo4j or NetworkX graph structures for analysis pipelines.
Coreference resolution significantly impacts relation extraction quality. Pronouns and entity mentions (referring expressions) link to earlier-discovered entities. Without coreference resolution, "the company" and "its headquarters" fail to connect to the original entity for relation classification.
Build a relation extraction pipeline that outputs to a knowledge graph format. Include coreference resolution and evaluate extraction completeness on news article datasets.