Population Sample Size
When running an experiment it is important to ensure you have a large enough sample to be able to detect impacts of the size that are important to you. If your sample size is too low your experiment will be underpowered and you would be unlikely to detect a reasonably sized impact.
Each metric in your experiment has a Minimum Likely Detectable Effect (MLDE) - this is the smallest change which, if it exists, is likely to be detected and shown as statistically significant. Impacts smaller than the MLDE may be missed and not reach significance because the sample size was too low to confidently distinguish the impact from random noise.
The larger the sample you have the smaller the impacts your experiment will be able to detect. It is often a trade-off between speed (not having to run the experiment longer to get a larger sample size) and sensitivity (being able to detect smaller changes).
By using the calculator below you can see how long you need to run your experiment to have a good chance of detecting a given effect size, if it does exist. If you do not know the baseline metric value, standard deviation or expected sample size per day, you can run a preliminary split with your intended targeting rules, but with all traffic seeing your default treatment, to measure these values. If your metric is a count, sum, average or ratio metric, use the first calculator for means metrics. Otherwise, if your metric is a percent of unique users metric, use the second calculator for proportions. Note that these calculators assume a significance threshold of 0.05 and a power threshold of 80%.
Using the calculators
If you have the experiment pack and you are unsure of any of the data required for the calculator, we recommended looking at the metric results for a similar split you have already ran, or running a "100% off" split with your intended targeting rules. You can then find the sample size, metric value and standard deviation from the Metric Details and Trends view reached by clicking into the metric card.
Expected sample size per day
This is the total sample size expected to enter your experimental rule each day, or your Daily Active Users (DAU). It will be the total across both treatments rather than per treatment.
The Sample size column is shown under the Sample population section of the data table. You may need to adjust this to get to a daily estimate.
For example, imagine you see the below table for a split which ran for a full week, to get the estimated sample size per day, first sum the sample sizes across the two treatments, to get 2000, then divide by 7 to get an estimated daily value of 285 users.
Baseline Metric Value
This is the expected value of the metric in your control group, or the value you expect to see for the treatment set as the baseline. If you are using a reference split, you can also find an estimate for this value in the Metric Details and Trends view, you will need the number under the Mean column in the Metric Dispersion section of the data table.
Baseline Standard Deviation
The standard deviation characterises how much variation there is in your metric. It is needed for the Means calculators but not for the percent-unique calculator.
You can find this value under the Stdev column in the Metric Dispersion section of the data table.
What size (relative%) change do you want to be able to detect?
This is the smallest change which, if it exists, is likely to be detected and shown as statistically significant. Impacts smaller than this may be missed and not reach significance. In this section input the smallest change to your metric which you would definitely want to know about.
Days in your typical seasonality cycle
The number of days in your seasonality cycle is similar to your review period. It is the length of time needed to ensure a representative set of users. For example, if you typically see your business level metrics vary across different days of the week, you should use a seasonality cycle and review period of at least a week.
Split encourages making decisions after full Review Periods to help you account for seasonality in your data. Hence, if your review period is set to 14 days, even if you had enough sample size after 12 days we still recommend running your experiment for a full review period of 14 days. These calculators will round up the recommended run time to the next full seasonality cycle.