The timestamp is recorded when the track event is called, so independent of when an event arrives it is included in any subsequent calculations for the current version of a split. There are a variety of factors that need to be considered in the Split pipeline - speed, stability, accuracy, comprehensiveness, etc.
With the above in mind, Split aims to bring the fastest time to value possible in our pipeline, but has made architectural decisions that increase stability and accuracy. With experimentation it is far more important to ensure the platform is operating on the most accurate information.
1) SDK to API: The API send events in batches, as configured by a parameter in SDK Configuration, (for example `eventFlushIntervalInMillis` in Java SDK), the default is stated in each SDK documentation, for most SDKs its 30 seconds. Another parameter set the events queue size (in Java SDK "eventsQueueSize" which is set to 500), if the events queue is full, the SDK will flush its content and post the events, regardless what the first parameter setting.
2) API to Queue: The API writes data to an ingestion queue, this typically takes only a few milliseconds.
3) Queue to S3: Uses the shortest possible buffer time for AWS firehose, which is 60 seconds.
4) S3 to Data Lake: A job retrieves the raw events data and processes it for storage in the Split data lake, ready for analysis. Typical run time for this job varies from 2 to 5 minutes depending on load.
5) Data Lake to Calculated Result: Jobs are scheduled based on how long the experiment has been running, since older experiments have low variance over small windows of data. Once a job begins, it takes from 15 seconds to 5 minutes to process and save the results, depending on the data volumes involved.
The last runtime shown in the application is based on when the job completes. Typically, it's fair to estimate about 5 minutes of pipeline delay from the time the event is received to the time it is available for processing. In many cases, the pipeline is faster than that (both the SDK and firehose buffer times are trains leaving a station, if you get there just before the train departs you don't wait long). Spikes in data volumes can add delay to the pipeline.