AWS Data Warehouse Links and Resources – Cloud, IS & Business Alignment

Data Warehouse

How to design a cloud-based data warehousing solution using Amazon Redshift, the petabyte-scale data warehouse in AWS.

Choosing Sort Keys https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html Sorting enables efficient handling of range-restricted predicates.
Distribution Styles https://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
Tutorial: Tuning Table Design https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables.html in this tutorial, you will learn how to optimize the design of your tables. You will start by creating tables based on the Star Schema Benchmark (SSB) schema without sort keys, distribution styles, and compression encodings.
10 Best Practices for Amazon Redshift Spectrum https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
Amazon Redshift Best Practices for Designing Tables https://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html These design choices also have a significant effect on storage requirements, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process queries.
TICKIT Sample Database Schema https://docs.aws.amazon.com/redshift/latest/dg/c_sampledb.html

Data Format Parameters for the COPY command https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html
Zero Administration Amazon Redshift Database Loader https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/ With this AWS Lambda function, you simply push files into a variety of locations on Amazon S3 and have them automatically loaded into your Amazon Redshift clusters.

This GitHub provides a collection of scripts and utilities that will assist you in getting the best performance possible from Amazon Redshift. https://github.com/awslabs/amazon-redshift-utils.
Amazon RedShift Parameter Groups can be used to configure database settings such as query timeout and datestyle. https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-parameter-groups.html.
Analyzing Tables https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html – Automatic analyze is enabled by default.
VACUUM https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html Resorts rows and reclaims space in either a specified table or all tables in the current database. Amazon Redshift automatically sorts data and runs VACUUM DELETE in the background.
Performing a Deep Copy https://docs.aws.amazon.com/redshift/latest/dg/performing-a-deep-copy.html
Run Mixed Workloads with Amazon Redshift Workload Management https://aws.amazon.com/blogs/big-data/run-mixed-workloads-with-amazon-redshift-workload-management/
Working with Concurrency Scaling https://docs.aws.amazon.com/redshift/latest/dg/concurrency-scaling.html– automatically adds additional cluster capacity when you need it to process an increase in concurrent read queries.
Implementing Automatic WLM https://docs.aws.amazon.com/redshift/latest/dg/automatic-wlm.html

What is a data lake? https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
Migrating Your Traditional Data Warehouse to a Modern Data Lake https://www.youtube.com/watch?v=3Xg3yu5xnMY AWS re:Invent 2017: (ABD327)
Building Data Lakes on AWS https://d1.awsstatic.com/whitepapers/Storage/data-lake-on-aws.pdf AWS white paper.
AWS Lake Formation https://aws.amazon.com/lake-formation/ makes it easy to set up a secure data lake in days.