Data Architecture, an overview related to AI implementations

There are components to a Data Architecture, namely a Data Model, a Reference Architecture, and a Star Graph.  A Data Architecture is the language and representation of your Data Strategy, including the tooling, platforms, Common Data Model and how the data will be structured, used, traced and verified for quality.

Data Model

Data models are essential to artificial intelligence solutions and to all data and analytics solutions in general because it provides an easy-to-understand view of what are the data entities and how they are related to each other.

The data model is a diagram composed of basic shapes like rectangles & circles (that represent entities) and connected by lines (that represent relationships).

The figure below shows four examples of different data models — dimensional data modeldata vault data model, graph data model and document DB data model — that can support and enable analytics and A.I. initiatives.

The way we structure the data in a data model impacts how we use the data. For example, with a dimensional data model implemented as star schema, aggregation queries like “what is our total sales amount and count by product in each branch for the month of December” can be relatively faster because the aggregations (i.e., total sales amount) are done in the fact table and can be grouped by dimensions (i.e., by product, in each branch, for a certain month).

Another example: if the use case calls for queries where we want to find “who are the colleagues of a person who have at least one year of experience on data modeling”, then a graph data model is a better structure to use because relationships between nodes (e.g., colleagues of a person) are implemented into the structure of the graph data model.

Reference Architecture

The second example is a reference architecture of a modern data platform. This is a “catch-all” example because we can create different versions of a reference architecture (from different perspectives) that will support analytics or A.I. initiatives.

One version can be a diagram showing how data is flowing from the source systems to the data platform via an ingestion mechanism, then data is validated, standardized, enriched or protected via a curation engine, then this enriched data (or information) is stored in a polyglot storage using the best-fit data model, and data is finally provisioned (as a data product) via an automated provisioning layer.

Another version of this reference architecture can be a diagram showing the foundational capabilities and components like data governancemetadata managementmaster data management and data quality management needed in a modern data platform. We then add capabilities and components like feature store or ML sandbox needed to enable, execute or implement the A.I. solution.

An example of this reference data architecture is shown below.

Knowledge Graph

A third example of a manifestation of a data architecture is knowledge graph which is defined by Stardog as the “representation of data that is enriched with real-world context, is based on the graph data structure, and has a flexible schema that allows for multiple definitions of the same data”.

Knowledge which is a collection of information which in turn is data imbued with meaning and context is organized in a knowledge graph that can serve as semantic layer.  An A.I. solution or application can then use this semantic layer for training models, making predictions, optimizing processes and generating content.

For example, we can leverage knowledge graph in retrieval augmented generation (RAG) to enhance the response of large language models (LLMs).

Data modelsreference architecture of modern data platformknowledge graphs are just three examples or manifestations of data architecture that will power and enable any analytics or A.I. initiatives.

In conclusion

In a digital landscape dominated by data and artificial intelligence, having a robust and well-grounded data architecture is a critical and vital necessity. It is the backbone that supports every data-driven and A.I.-led endeavor, from extracting insights to making informed decisions.

Data architecture’s role in shaping how data is collected, stored, structured, processed and provisioned directly impacts the success of analytics and A.I. initiatives.