In order to manage data – migrations and operational updates – it has been necessary to take raw data and parse or process it, into both OLTP (on line transactional processing data) and OLAP (on line analytical processed data). Medallion data processes are most effectively achieved in the cloud with AWS, Azure, Databricks and other platforms providing virtualised patterns and deployments to take raw data, ingest, transform and curate it with python, pyspark, or OOTB libraries and pipelines (ADF for eg). The curated data can be consumed downstream by end users (end user tools ranging from PowerBI to Jupyter/Colab) or applications, databases, systems and SaaS.
- OLTP (Online Transaction Processing) focuses on managing real-time transactions and ensuring data integrity for daily operations, while OLAP (Online Analytical Processing) is designed for complex data analysis and reporting, allowing businesses to derive insights from large datasets. Both systems are essential for effective data management in organizations.
Data usage is accelerating in volume, variety, and velocity.
- Improve data quality through progressive refinement.
- Enhance data governance with clear data lineage.
- Increase flexibility and scalability for different data workloads.
- Optimize data processing for efficiency and cost-effectiveness.
- Support multiple analytical needs from raw data exploration to strategic decision-making.
- Ensure compliance with data privacy and regulatory requirements.
By segmenting data into distinct layers, a ‘Medallion Architecture’ aims to transform raw data into a strategic asset, enabling businesses to derive value from their data more efficiently and effectively.
What is it
Medallion Architecture organizes data into three distinct layers: the Bronze for raw data ingestion, Silver for cleaning and basic processing, and Gold for advanced analytics and business-ready insights. This layering reflects a process akin to refining raw metal into a valuable medallion, symbolizing the increasing value of data as it moves through these stages.
Definition of medallion architecture from the Databricks website.
A medallion architecture is a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture (from Bronze ⇒ Silver ⇒ Gold layer tables).
The Medallion Architecture is also referred to as a “multi-hop” architecture.
The traditional three-layer system (Bronze, Silver, and Gold) is expanded into a more complex, multi-tiered data processing framework by the idea of a “Multi-Hop” architecture in Medallion Architecture. This approach is designed to handle increasingly complex data workflows by adding additional layers, each tailored to specific analytical or business needs:
- Bronze Layer: Acts as the entry point for all raw data, capturing information in its most basic form without transformation.
- Silver Layer: Here, data undergoes cleaning, deduplication, and basic transformations, preparing it for more sophisticated analysis.
- Gold Layer: Data is now fully processed, aggregated, and optimized for general business intelligence and reporting.
