idea selection
the principle is that not every idea is worth testing.
- quantitative analysis using historical data to obtain the opportunity sizing of each idea
- qualitative analysis with focus groups and surveys
Designing A/B Tests
test duration
sample size can be estimated with formula:
if sample variance is large, then more samples are needed and vice versa.
delta is the minimum detectable effect, which should be decided by multiple stakeholders.
test duration is estimated by dividing the sample size by the number of users in each group(control and treatment group), but more is almost always better than not enough.
Interference between control and treatment groups
-
for social networks, the treatment effect underestimates the actual effect. solution to eliminate interference effect: 1) create network clusters to represent groups of users who are more likely to interact with people within the group than people outside of the group 2) Ego-cluster randomization
- in two-sided markets the treatment effect overestimates the actual effect
solution to eliminate interference effect:
1) Geo-based randomization
2) time-based randomization
Analyzing Results
- If an A/B test has a larger or smaller initial effect, it’s probably due to novel or primacy effects. to address this issue: 1) completely rule out the possibility of those effects by running tests only on first-time users.
- when testing with multiple variants to see which one is the best amongst all the features, the probability of a false positive (or Type I error) is higher than single variant.
to address this issue:
1) Bonferroni correction: divides the significance level 0.05 by the number of tests
2) control the false discovery rate (FDR): FDR = E[# of false positive / # of rejections]
Making Decisions
make decision focusing on the current objective if we see contradicting results in different metrics.