Machine Learning Pipeline

Architects complete machine learning pipelines covering every stage from raw data to production predictions. Includes data ingestion and validation, feature engineering and stores, model training with experiment tracking, hyperparameter optimization, evaluation frameworks, deployment patterns (batch, real-time, edge), A/B testing, and production monitoring with data and model drift detection.

Usage

Describe your ML problem (classification, regression, ranking, etc.), data sources, prediction serving requirements (batch vs real-time), and team maturity. Specify infrastructure preferences (AWS SageMaker, GCP Vertex, self-hosted MLflow). This skill produces a complete pipeline architecture with tool recommendations and implementation phases.

Examples

"Design an ML pipeline for a fraud detection system processing 10K transactions per second with sub-100ms latency"
"Build a batch prediction pipeline using Airflow and MLflow for weekly customer churn scoring on 5M users"
"Create a feature store architecture that serves both real-time and batch features for a recommendation system"

Guidelines

Version everything: datasets, features, model code, hyperparameters, and trained model artifacts
Validate input data with schema checks and statistical tests before it enters the training pipeline
Use feature stores to share features between training and serving and prevent training-serving skew
Track all experiments with MLflow, W&B, or similar tooling so results are reproducible
Automate hyperparameter tuning with Optuna or Ray Tune rather than manual grid search
Deploy models behind feature flags and use shadow mode or canary releases before full rollout
Monitor prediction distributions, feature distributions, and model accuracy metrics in production
Set up automated retraining triggers based on drift detection thresholds, not fixed schedules

Machine Learning Pipeline

Usage

Examples

Guidelines

More Development Skills