Important: As of August 13, 2024, this page will no longer be actively maintained. Please refer to the current version of this content here.
The S3 integration with Analytics is available for Enterprise customers only. If interested, please contact us. You are required to grant S3 access to Analytics by editing the IAM policy of an existing S3 bucket.
In order to perform the following steps you must have administrative access to the AWS Console as well as your S3 database.
If you are integrating Snowplow data via AWS S3, please use our Snowplow Integration documentation instead.
Instructions
You are required to configure an IAM policy to grant Analytics programmatic access to the respective S3 bucket.
In Analytics
1. In Analytics, click on the gear icon and select Project Settings.
2. Select the Data Sources tab.
3. Select New Data Source.
4. Select Amazon S3 and Define your own schema. Click Connect5. You should see this screen.
6. Click Next.
Connection Information
- Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
- For Source Format, select the file format of the data in your S3 bucket.
- For Bucket Name, enter the name of the S3 bucket that we should connect to. Click on the Show Me on the right to see how to get this information.
- For File Path, enter the file path corresponding to the data you want to use in Analytics. Click on the Show Me on the right to see how to get this information.
- Click Next
Grant Permissions
- In this section click on the box that contains the policy to copy to your clipboard. You will need to use this in step 4 of this section.
- Go back to the AWS Console. Select the bucket and click on the Permissions tab.
- Click on Bucket Policy
- Enter the copied policy from step 1 into the editor and click Save
- Click Next in Analytics.
Event Modeling
- Events Field - enter the name of the field that should be used to derive your Analytics event names
- Timestamp - enter the name of the field that should be used for querying in Analytics.
- Click Next
User Modeling
For more information on User Identification (Aliasing), please refer to this article.
- If you choose to enable Aliasing:
- Unauthenticated ID - Input the field used to identify anonymous users.
- Authenticated ID - Input the field used to identify known users.
- I you choose to disable Aliasing, press Disabled:
- Unauthenticated ID - Enter the field used to identify your users. All users must have a value for this field.
- Press Next
Scheduling
- Select the Schedule Interval to adjust the frequency at which new data is available in Analytics.
- Set the Schedule Time for when the data should be extracted from your BigQuery environment. It is critical that 100% of the data is available by this time to avoid loading partial data.
- Select Save
Waiting for Data
Advanced Settings
For additional advanced settings such as excluding certain events and properties, please refer to this page
If you have any questions or concerns about the above Integration, please contact your Customer Support Manager, or email support@mparticle.com.