Split has updated the UI on the Metrics impact tab to show metrics cards as statistically significant when the p-value is below 0.05 (or whatever the org setting, if the default is changed) even if it has not reached 80% power.
Previously, the power threshold was the priority and even if the p-value was below 0.05 the card was not shown as having reached statistical significance. Instead, it showed as a grey card if it had < 80% power even if it had < 0.05 p-value.
Why the change?
Cards showing a p-value below 0.05 but not as statistically significant caused some confusion for some users. With this approach, we are using the org setting of significance as the primary driver, which is more crucial for decision making. The power threshold is used for estimates for cards showing 'needs more data'.
What do you need to be aware of?
Some previously grey cards may turn to green or red. The ‘needs more data’ cards should be considered estimates, not definitive predictions. This may be due, in part, to the way events flow into Split, particularly if there is a high variance where anomalous data could swing a treatment. The 'needs more data' card may indicate you need 1000 more users for stat sig (based on 80% power). In reality, fewer users may be needed and you could see stat sig results with just 200 more users, but based on 50% power.
The p-value determines whether or not something is stat sig, and the power indicates how easily we can detect it. If an observed difference is on the threshold of p=0.05 then it's right on the boundary of significant or not-significant and therefore has about 50% chance of falling above the boundary and 50% chance to fall below. Anything that has < 0.05 p-value (assuming that's the organizational setting) will have at least 50% power. The 80% power will only be used for estimating the number of days and the Minimum Likely Detectable Effect for the 'needs more data' cards.
Please sign in to leave a comment.