AWS Big Data Landscape – a complicated arrangement – Cloud, IS & Business Alignment

There are many ways to construct a big data flow on AWs depending on the time, skills, budgets, objective and operational support.

Key architectural principle is simplicity. A second is cost control.

Types of Data:

Methods of using Big Data:

Delivery:

Architecture Principles

Decouple the process flow Data – ingest – store – ETL or process – analyse
Right tools for right job
Leverage Managed and Serverless
Use event-journal design patterns
-immutable data lakes, materialised views
Cost control
Machine Learning connected to models, app services
Decouple Compute from Storage (HBase with S3 for eg)
Tier Data between Hot and Cold needs– HDFS (frequent)-S3-Glacier (cold)
Types of DB are related to schemas, types, access and retrieval, OLTP vs. OLAP, transactional vs storage

ETL – normalised view of different data sets and schemas eg Glue analyses, CSV, JSON, Parquet