Alert Policies are currently in beta
Please contact firstname.lastname@example.org if you would like more information or if you would like early access.
For alert policies, rather than testing for statistically significant evidence of any impact as we do for our standard metric analyses, we test for significant evidence of an impact greater than your chosen degradation threshold, in the opposite direction to the metric’s desired direction.
This means that, by design, if your observed impact is equal to the set threshold it will not fire an alert. For an alert to fire, the observed degradation will need to be a certain amount more extreme than the threshold you’ve chosen. Exactly how much more extreme it would need to be (sometimes called the Minimum Detectable Effect) depends on the power of the metric, which is influenced primarily by sample size and the variance in the metric values.
For example, imagine you have a ‘Percentage of Unique Users’ metric which has a value of 60% in the baseline treatment, and you use a degradation threshold of 10%. If the desired direction of the metric is a decrease, then we would be testing for evidence that the Percentage of Unique Users in the comparison group is more than 66% (more than 10% higher than the baseline value).
Assuming a 50/50 percentage rollout of users between baseline and comparison treatments, and an Org wide significance level of 0.05, with 10,000 unique users you would only see an alert if the observed percentage for the comparison group were higher than 68%. If instead you had 1000 or 100,000 unique users, the comparison group value would need to be higher than 73% and 66.7%, respectively, for an alert to be raised.
Hence, we recommend setting an alert threshold that is less extreme than any degradation which you would definitely want to be alerted for. Chose a threshold which is close to the boundary between a safe or acceptable degradation and a degradation which you would want to know about.