Data Cleaning Pipeline
Build robust data cleaning pipelines that ensure data quality for analysis and modeling.
Usage
Describe your dataset and quality issues to get a cleaning pipeline.
Examples
- "Clean a customer dataset with 30% missing values"
- "Build a preprocessing pipeline for messy CSV imports"
- "Handle outliers in our sales transaction data"
Guidelines
- Profile data before cleaning to understand the scope
- Document every cleaning decision and its rationale
- Handle missing values based on their mechanism (MCAR, MAR, MNAR)
- Remove exact duplicates and flag fuzzy duplicates
- Validate cleaned data against business rules