A/B Test Analyzer
Design and analyze A/B tests with proper statistical methodology.
Usage
- Form a clear hypothesis: "Changing X will improve Y by Z%"
- Calculate required sample size based on baseline rate, minimum detectable effect, and significance level
- Determine test duration and traffic allocation
- Monitor for technical issues without peeking at results
- Analyze results with proper statistical tests and make a decision
Examples
- Sample size calculation: Baseline conversion rate: 5%. Minimum detectable effect: 10% relative (0.5 percentage points). Significance level: 95% (alpha=0.05). Power: 80%. Required sample: ~31,000 per variant. At 1,000 visitors/day with 50/50 split: test needs 62 days. If that's too long, either increase traffic or accept a larger minimum detectable effect
- Analyzing results: Control: 5,000 visitors, 250 conversions (5.0%). Variant: 5,000 visitors, 280 conversions (5.6%). Relative lift: +12%. P-value: 0.18. NOT statistically significant (p > 0.05). Decision: do not ship the variant. The apparent improvement could be due to chance. Need more sample or the effect isn't real
- Segmented analysis: Overall result: no significant difference. But segment by device: mobile shows +15% (significant), desktop shows -5% (not significant). This suggests a mobile-specific improvement. Validate with a follow-up mobile-only test before concluding — segment analysis inflates false positives
Guidelines
- Never peek at results before reaching your pre-calculated sample size — peeking inflates false positive rates from 5% to 30%+
- If you must check early, use sequential testing methods (always valid p-values) instead of fixed-horizon tests
- Test one change at a time. If you change headline AND button color, you can't attribute the effect to either
- Run tests for full weeks (7, 14, 21 days) to account for day-of-week effects
- A "non-significant" result is still a result — it tells you the change doesn't matter enough to invest in
- Document every test: hypothesis, variants, sample size, duration, result, decision. Build an institutional testing knowledge base