Named Entity Recognition
Implements named entity recognition systems for extracting structured information from unstructured text. Covers standard entity types (person, organization, location, date, money) and custom domain-specific entities, using approaches from spaCy and Hugging Face models to LLM-based extraction with output schema validation and post-processing normalization.
Usage
Describe the text sources to process, the entity types you need to extract, and how extracted entities will be used downstream (database population, knowledge graphs, analytics). Specify the domain (legal, medical, financial, general) and any custom entity types unique to your use case. This skill provides a complete NER pipeline design with model selection and integration patterns.
Examples
- "Extract company names, contract values, and effective dates from 5,000 legal contracts in PDF format"
- "Build a medical NER system that identifies drug names, dosages, symptoms, and conditions from clinical notes"
- "Create an LLM-based entity extractor that pulls product names, prices, and specifications from competitor websites"
Guidelines
- Start with pre-trained models (spaCy, Hugging Face NER) for standard entities before training custom models
- Use LLM-based extraction with JSON schema output for complex or domain-specific entity types
- Annotate at least 200 diverse examples per custom entity type for fine-tuning, more for rare entities
- Apply post-processing rules to normalize extracted entities (date formats, name casing, deduplication)
- Use IOB2 tagging format for token-level annotation to handle multi-word entities correctly
- Evaluate with entity-level F1 score (exact match), not just token-level accuracy
- Build entity linking to map extracted mentions to canonical forms in your knowledge base
- Handle nested entities (e.g., organization within a location) by using separate extraction passes