Text Classification

Implements text classification systems for categorizing documents, tickets, reviews, emails, and other text content. Covers approach selection (TF-IDF + classical ML, transformer fine-tuning, embedding + kNN, LLM zero/few-shot), label taxonomy design, training data preparation, model evaluation, confidence calibration, and production deployment with monitoring.

Usage

Describe what text you need to classify, the target categories, available labeled data volume, and latency/cost constraints. Specify whether categories are mutually exclusive (single-label) or overlapping (multi-label). This skill recommends the best approach for your constraints and provides implementation guidance.

Examples

"Build a support ticket classifier that routes tickets to billing, technical, or account teams with 95% accuracy"
"Create a zero-shot classifier using Claude to categorize product feedback into feature requests, bugs, and praise"
"Design a multi-label content moderation system that flags text for toxicity, spam, and personally identifiable information"

Guidelines

Start with LLM zero-shot classification if you have fewer than 100 labeled examples per category
Use embedding similarity with kNN for 100-1000 examples; fine-tune transformers above 1000 examples
Design a flat taxonomy with 5-15 categories initially; add hierarchy only when precision demands it
Include an "Other/Unknown" category to catch inputs that don't fit defined classes
Set confidence thresholds and route low-confidence predictions to human review queues
Measure precision and recall per class, not just overall accuracy, to catch class imbalance issues
Use stratified train/test splits to ensure minority classes are represented in evaluation sets
Monitor classification drift in production by sampling predictions weekly for manual review

Text Classification

Usage

Examples

Guidelines

More Development Skills