The Snowplow Unified Log is stored in an S3 bucket and you is required to write an IAM policy to grant Indicative programmatic access to the respective S3 bucket.
If there are additional enrichments required, such as joining with user property tables or deriving custom user_ids, please contact us.
Adding a Data Source In Indicative
In Indicative, click on Settings and select Data Sources
Click on New Data Source
- Select Connect via Data Warehouse or Lake
- Select S3 as your data connection and Snowplow as the connection schema and click Connect
- You should see this S3 + Snowplow Overview screen. Click Next
- Sign in to the AWS Management Console and open your IAM console.
- Under the Services dropdown, select S3 under Storage.
- Click on the bucket that contains your Snowplow data.
- Enter in the Bucket Name into the Indicative UI.
- Click on your bucket and refer to the bucket structure. Enter that into File Path field in the Indicative UI.
In this example, the File Path to put into the Indicative field is /main/enriched/good
- Click Next
- In this section click on the box that contains the policy to copy to your clipboard. You will need to use this in step 4 of this section.
- Go back to the AWS Console. Select the bucket and click on the Permissions tab.
- Click on Bucket Policy
- Enter the copied policy from step 1 into the editor and click Save
- Click Next in Indicative.
- In the Structured Event Name section, select the field that should be used to derive Indicative event names. Our logic will first look at this field, an if this value is null, it will try to use the event_name field. If that value is also null, then we will look at the event field.
- None - Select this option if you're not using Snowplow's structured events
- For Timestamp, select the field that represents the time that the event was performed. If unsure, leave as derived_tstamp
- For Vendor Name, input the Snowplow vendor names used so we can simplify your event property names
User Identification (Aliasing)
For more information on User Identification (Aliasing), please refer to this article.
*Note: If aliasing is not preferred, please set the Authenticated ID Type to None and press Next
- Select the Type for the Unauthenticated ID
Atomic - This will allow you to choose between the domain_userid and network_userid fields that are part of the standard Snowplow event structure.
We typically recommend domain_userid since this uses a 1st party cookie. Click here for more information.
- Context - If the unauthenticated ID is part of a Snowplow context, choose this option. Enter the values for Vendor,Name,Version, and Field.
Other - If the unauthenticated field is not either of the options, please specify where we can find the unauthenticated ID in the data.
- Atomic - This will allow you to choose between the domain_userid and network_userid fields that are part of the standard Snowplow event structure.
- Select the Type for Authenticated ID
- Atomic - Enter the field name that should be used for known users. Typically, it is the user_id field in the raw enriched event archive data.
- Context - If the authenticated ID is part of a Snowplow context, choose this option. Enter the values for Vendor,Name,Version, and Field.
- Other - If the authenticated field is not either of the options, please specify where we can find the authenticated ID in the data.
- None - choose this option to skip aliasing.
- Select the Schedule Interval to adjust the frequency at which new data is available in Indicative.
- Set the Schedule Time for when the data should be extracted from your S3 bucket. It is critical that 100% of the data is available by this time to avoid loading partial data.
- Select Next
Waiting for Data
For additional advanced settings such as excluding certain events and properties, please refer to this page