Neural Network Designer
Guides the design of neural network architectures for various tasks including image classification, sequence modeling, generative models, and multimodal systems. Covers layer selection (dense, convolutional, recurrent, attention), activation functions, regularization techniques (dropout, batch norm, weight decay), optimizer selection, learning rate scheduling, and systematic debugging of training issues.
Usage
Describe your task (classification, generation, detection, etc.), input data format and dimensions, available compute resources, and performance targets. Specify any constraints on model size or inference latency. This skill recommends an architecture with layer configurations, training hyperparameters, and a debugging checklist for common training failures.
Examples
- "Design a CNN architecture for classifying 224x224 medical images into 15 disease categories with limited training data"
- "Create a transformer encoder architecture for sequence classification that runs on mobile devices under 50MB"
- "Architecture a U-Net variant for semantic segmentation of satellite imagery with 10 land-use classes"
Guidelines
- Start with proven architectures (ResNet, EfficientNet, BERT) and modify rather than building from scratch
- Use transfer learning from pre-trained models when your dataset has fewer than 10K labeled examples
- Apply batch normalization or layer normalization to stabilize training and allow higher learning rates
- Use dropout (0.1-0.5) and weight decay (1e-4 to 1e-2) together to prevent overfitting
- Start with Adam optimizer (lr=3e-4) as a baseline; switch to AdamW or SGD+momentum for fine-tuning
- Implement learning rate warmup for transformer models and cosine annealing for longer training runs
- Monitor gradient norms during training to detect vanishing or exploding gradient problems early
- Profile memory usage and compute per layer to identify bottlenecks before scaling up the model