Neural Network Designer

Guides the design of neural network architectures for various tasks including image classification, sequence modeling, generative models, and multimodal systems. Covers layer selection (dense, convolutional, recurrent, attention), activation functions, regularization techniques (dropout, batch norm, weight decay), optimizer selection, learning rate scheduling, and systematic debugging of training issues.

Usage

Describe your task (classification, generation, detection, etc.), input data format and dimensions, available compute resources, and performance targets. Specify any constraints on model size or inference latency. This skill recommends an architecture with layer configurations, training hyperparameters, and a debugging checklist for common training failures.

Examples

"Design a CNN architecture for classifying 224x224 medical images into 15 disease categories with limited training data"
"Create a transformer encoder architecture for sequence classification that runs on mobile devices under 50MB"
"Architecture a U-Net variant for semantic segmentation of satellite imagery with 10 land-use classes"

Guidelines

Start with proven architectures (ResNet, EfficientNet, BERT) and modify rather than building from scratch
Use transfer learning from pre-trained models when your dataset has fewer than 10K labeled examples
Apply batch normalization or layer normalization to stabilize training and allow higher learning rates
Use dropout (0.1-0.5) and weight decay (1e-4 to 1e-2) together to prevent overfitting
Start with Adam optimizer (lr=3e-4) as a baseline; switch to AdamW or SGD+momentum for fine-tuning
Implement learning rate warmup for transformer models and cosine annealing for longer training runs
Monitor gradient norms during training to detect vanishing or exploding gradient problems early
Profile memory usage and compute per layer to identify bottlenecks before scaling up the model

Neural Network Designer

Usage

Examples

Guidelines

More Development Skills