Named Entity Recognition

Implements named entity recognition systems for extracting structured information from unstructured text. Covers standard entity types (person, organization, location, date, money) and custom domain-specific entities, using approaches from spaCy and Hugging Face models to LLM-based extraction with output schema validation and post-processing normalization.

Usage

Describe the text sources to process, the entity types you need to extract, and how extracted entities will be used downstream (database population, knowledge graphs, analytics). Specify the domain (legal, medical, financial, general) and any custom entity types unique to your use case. This skill provides a complete NER pipeline design with model selection and integration patterns.

Examples

"Extract company names, contract values, and effective dates from 5,000 legal contracts in PDF format"
"Build a medical NER system that identifies drug names, dosages, symptoms, and conditions from clinical notes"
"Create an LLM-based entity extractor that pulls product names, prices, and specifications from competitor websites"

Guidelines

Start with pre-trained models (spaCy, Hugging Face NER) for standard entities before training custom models
Use LLM-based extraction with JSON schema output for complex or domain-specific entity types
Annotate at least 200 diverse examples per custom entity type for fine-tuning, more for rare entities
Apply post-processing rules to normalize extracted entities (date formats, name casing, deduplication)
Use IOB2 tagging format for token-level annotation to handle multi-word entities correctly
Evaluate with entity-level F1 score (exact match), not just token-level accuracy
Build entity linking to map extracted mentions to canonical forms in your knowledge base
Handle nested entities (e.g., organization within a location) by using separate extraction passes

Named Entity Recognition

Usage

Examples

Guidelines

More Development Skills