The problems with Data Pipelines and the hydration of a Data Lake include: Data teams often end with technical debt surrounding…
In essence Data Operations is based on DevSecOps or DevOps and applies these same ideas to the life cycle of…
DataLake The entire concept of a Data Operations Platform rests on top of a Data Lake. There is no simple…
Data Operations ‘DataOps’ has been inspired by the Agile-premised ‘Development Operations’ model. The ‘DevOps’ model which usually includes security (DevSecOps),…
The icebergth is hereth. Apache Iceberg is an open-source table format for large-scale data systems, designed to provide efficient and…
Data files or tables are parsed into smaller units. This is also called ‘partitioning’. A partition is usually performed against…
Parquet is a file format standard used in many enterprises. It allows the standardisation of files and provides a common…
Databricks and Snowflake overlap in many areas. Firms deploying both need to clearly demarcate the epics and use case journeys…
A straightfoward method to automate data ingestion from S3 buckets (data lake) to a Redshift (data warehouse) cluster; by using…
Data Ingestion Challenges Data ingestion can be complicated. There are usually a variety of data sources, including both SQL and…