AWS Glue is a meta data catalogue service with Extract-Transform-Load logic. The Glue catalogue is based on Hive and is…
Data flowing into the Data Lake obviously changes. Data table changes are captured by CDC or change data capture. Changes…
Amazon Redshift is a petabyte scalable columnar data warehouse that is very efficient in storing raw data and collecting data…
Data products are the end result of file or data movements to the cloud; ETL; processing; de-duplication; curation and storage…
In simple terms we can identify the differences between Data Lakes and Data Warehouses. Data Lake: A data lake is…
Digital Transformation Digital transformation not a magic solution nor a buffet of word salads. DT is roughly defined as the…
A typical Technology Stack for a Data Lake. S3 as the Golden Source. Snowflake as a corporate Data Share with…
(ETL engine in the above could be AWS Glue) There are various ways to define performance and what that means. …
Iceberg Cometh Open table formats, such as Apache Iceberg, enable scale-out data warehousing directly on a data lake. This architecture…
A data lake is a centralized repository that allows a firm to store structured and unstructured data at any scale.…