ETL Pipeline Design

Designs Extract, Transform, Load (ETL) and ELT pipeline architectures for data integration, warehousing, and analytics. Covers source extraction patterns (CDC, full load, incremental), transformation logic (cleaning, enrichment, aggregation), loading strategies (upsert, append, SCD), error handling, data quality checks, orchestration with Airflow or Prefect, and transformation with dbt.

Usage

Describe your data sources (databases, APIs, files), destination (data warehouse, data lake), transformation requirements, data volume, and freshness needs. Specify your preferred tools and cloud platform. The skill designs a complete pipeline architecture with extraction, transformation, and loading patterns for your specific scenario.

Examples

"Design an ETL pipeline that extracts from PostgreSQL and 3 REST APIs, transforms in Python, loads to BigQuery"
"Create a CDC pipeline using Debezium to stream database changes to a Kafka topic for real-time analytics"
"Build a dbt transformation layer that creates dimensional models from raw Snowflake staging tables"
"Design an incremental pipeline that processes only new/changed records using watermark timestamps"

Guidelines

Prefer ELT over ETL when your warehouse has the compute power — transform in SQL using dbt for maintainability
Implement idempotent loads: rerunning a pipeline for the same time period should produce identical results
Use incremental extraction with high-watermark columns (updated_at) to avoid full table scans on each run
Build data quality checks between stages: row counts, null checks, referential integrity, freshness assertions
Log extraction metadata (row counts, timestamps, source versions) for debugging and lineage tracking
Handle late-arriving data and out-of-order events with appropriate windowing and upsert strategies
Use dead-letter queues for records that fail transformation, with alerting for human review
Design for backfill capability: any pipeline should be able to reprocess historical data on demand

ETL Pipeline Design

Usage

Examples

Guidelines

More Data & Analytics Skills