⚙️

Machine Learning Pipeline

Verified

by Community

Architects complete machine learning pipelines covering data ingestion, feature engineering, model training, hyperparameter tuning, evaluation, deployment strategies, A/B testing, and production monitoring with drift detection.

ml-pipelinemlopsfeature-engineeringmodel-deploymentmachine-learning

Machine Learning Pipeline

Architects complete machine learning pipelines covering every stage from raw data to production predictions. Includes data ingestion and validation, feature engineering and stores, model training with experiment tracking, hyperparameter optimization, evaluation frameworks, deployment patterns (batch, real-time, edge), A/B testing, and production monitoring with data and model drift detection.

Usage

Describe your ML problem (classification, regression, ranking, etc.), data sources, prediction serving requirements (batch vs real-time), and team maturity. Specify infrastructure preferences (AWS SageMaker, GCP Vertex, self-hosted MLflow). This skill produces a complete pipeline architecture with tool recommendations and implementation phases.

Examples

  • "Design an ML pipeline for a fraud detection system processing 10K transactions per second with sub-100ms latency"
  • "Build a batch prediction pipeline using Airflow and MLflow for weekly customer churn scoring on 5M users"
  • "Create a feature store architecture that serves both real-time and batch features for a recommendation system"

Guidelines

  • Version everything: datasets, features, model code, hyperparameters, and trained model artifacts
  • Validate input data with schema checks and statistical tests before it enters the training pipeline
  • Use feature stores to share features between training and serving and prevent training-serving skew
  • Track all experiments with MLflow, W&B, or similar tooling so results are reproducible
  • Automate hyperparameter tuning with Optuna or Ray Tune rather than manual grid search
  • Deploy models behind feature flags and use shadow mode or canary releases before full rollout
  • Monitor prediction distributions, feature distributions, and model accuracy metrics in production
  • Set up automated retraining triggers based on drift detection thresholds, not fixed schedules