Example: Basic Python code generates events Parquet file to integrate Amazon S3 with Split. To learn more about this integration, refer to the Amazon S3 integration guide.
Environment:
- Python 3.7
- Pandas 1.2.2
- Pyarrow 3.0.0
How to use:
- Using the code below, be sure to replace the variables declared in the top section, in addition to the Customer key, event value, and properties names and values.
- Be sure to note the timestamp is in Epoch milliseconds
- Append as many events as needed by repeating the line df = df.append
- Resulted parquet file can be copied into the S3 bucket dedicated for Split S3 event integration
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
###################################
environmentId = "ENVIRONMENT ID"
eventTypeId = "EVENTTYPE ID"
trafficTypeId = "TRAFFICTYPEID"
parquetFileName = "PARQUETFILE"
###################################
df = pd.DataFrame()
df = df.append(
{
'environmentId': environmentId,
'eventTypeId': eventTypeId,
'trafficTypeId': trafficTypeId,
'key': "Customer Key",
'value': 10.0,
'timestamp': 1625183279000,
'properties': pd.Series([
[('key', 'age'), ('value', "20")],
[('key', 'county'), ('value', "US")],
[('key', 'tier'), ('value', "Premium")]
])
}, ignore_index=True)
print("creating parquet file")
udt = pa.map_(pa.string(), pa.string())
schema = pa.schema([pa.field('environmentId', pa.string()),
pa.field('eventTypeId', pa.string()),
pa.field('trafficTypeId', pa.string()),
pa.field('key', pa.string()),
pa.field('value', pa.float64()),
pa.field('timestamp', pa.int64()),
pa.field('properties', udt), ])
table = pa.Table.from_pandas(df, schema)
pq.write_table(table, parquetFileName)
Comments
0 comments
Please sign in to leave a comment.