AWS, Snowpipe, Snowflake

Moving data from S3 to Snowflake to satisfy use cases around analysis, corporate reporting, or cross-domain information collaboration is best achieved through Snowpipe. 

Snowpipe is an event-based data ingestion tool that comes together with Snowflake. Snowpipe has two main methods to trigger a data loading process.

  1. Cloud Storage Event Notifications (AWS S3, GCP CS, Azure Blob)
  2. Snowpipe’s REST API

This article will focus on option one. The data loading will be triggered when new data files land in AWS S3. The image below shows how both methods work.

Snowpipe data ingestion. Image by Snowflake documentation

 

Setting up AWS and Snowflake

Set up the connection between AWS and Snowflake.

Prepare an S3 bucket, configure permission for the bucket, and create an IAM role.

Then create a storage integration in Snowflake.

Step 1 Creating an S3 bucket

We will create a new bucket for this scenario. We need to go to the Amazon S3 service and create a new bucket there.

Create a new bucket.

For a demo can use default settings.

Create a bucket name. Image by Author
Step 2 Configuring access permission for S3 bucket

Set up an IAM policy that will be used for the S3 bucket.

Snowflake requires the following permission to access files in the S3 bucket:

  • s3:GetBucketLocation
  • s3:GetObject
  • s3:GetObjectVersion
  • s3:ListBucket

Before creating the permission, we need to ensure that the Security Token Service in the account settings is active for the region we used. To do that, go to the IAM service and move to the Account settings section.

Security Token Service. Image by Author.

Then, to create a new policy, we need to navigate to the policy page and click Create policy.

Create a new policy in JSON format. 

Fill the JSON policy, add the policy name and click Create policy.

 

Can use the snowflake_access_tutorial as the policy name.

Setup policy information. 
Step 3 Create an IAM role

This new IAM role will be tied to the IAM policy we created in Step 2. Then we will use this IAM for the snowflake storage integration.

Go to the Roles section to create a new IAM role.

Create new role. 

Select AWS account → Another AWS account as the account type. Then put your AWS ID as an account id. Also, enable the required external ID and put 0000 for now.

Setup new roles. 

The next step is selecting the permission policy. Can choosesnowflake_access_tutorial for permission.

Embed permission to the role. 

Put the name on the role. Can use snowflake_tutorial as the role name.

Add role detail.

Once creating the role, record and copy the ARN values in the summary section.

Capturing ARN. 
Step 4 Create storage integration

Configure the connection between S3 and Snowflake. Create an external storage integration to link the IAM role and Snowflake.

We can run the following code in the snowflake worksheet to do that.

Can change the STORAGE_AWS_ROLE_ARN with the recorded ARN in the previous step. Then fill the STORAGE_ALLOWED_LOCATIONS with the storage bucket URL.

Creating storage integration.

Run this command to check the integration detail and take a note on:

  • STORAGE_AWS_IAM_USE_ARN
  • STORAGE_AWS_EXTERNAL_ID
desc integration S3_role_integration;
Storing integration detail. Image by Author.
Step 5 Configure IAM Role Trust relationship

This step requires us to get back to the AWS IAM Role section. Go to snowflake_tutorial roles. Then on the Trust relationship tab, click edit trust relationship and edit the file with the recorded STORAGE_AWS_IAM_USE_ARN and STORAGE_AWS_EXTERNAL_ID.

Setup for S3 and Snowflake is complete. Now start creating a pipe and automatically bring data from S3 to Snowflake when new data comes into S3.

Creating a Snowpipe

Let’s begin by creating a new database, table, and stage for our landing zone. We can run the following codes in the Worksheet to do that:

Create a database

create or replace database S3_db;

Create a table

create or replace table S3_table(files string);

Create a stage

use schema S3_db.public; create or replace stage S3_stage   
url = ('s3://bima-snowflake-tutorial/')
storage_integration = S3_role_integration;

Then we can create a pipe with the code below:

Create a pipe

create or replace pipe S3_db.public.S3_pipe auto_ingest=true as   
copy into S3_db.public.S3_table
from @S3_db.public.S3_stage;

This code means we are creating a pipe with the name S3_pipe and enabling auto_ingest, which will ingest the data automatically once it lands in the S3. The pipe will copy the file in the S3 to the S3_table in Snowflake.

The next step is to record the notification channel ARN for this pipe and add it to the S3 bucket event notification.

Run this code to get the notification channel ARN

show pipes;
Pipe notification channel ARN. 

Get into the AWS S3 service console and select the bucket. Now, go to the Properties tab and create an event notification.

Setup event notification. Image by Author.

Fill the setup with the following values:

Event Info:

  • Name: Auto-ingest Snowflake

Event Types

  • Events: All objects create events

Destination

  • Send to: SQS Queue
  • SQS: Add SQS queue ARN
  • SQS queue ARN: Fill your notification channel ARN here

====