Data Glossary, Dictionary, Catalogue

The Data Glossary focuses on business terminology and definitions, bridging the gap between business and IT. A data glossary is a document that defines and describes the data used by an organization. It provides a common understanding of the meaning and usage of data terms across the organization and helps to ensure that data is used consistently and accurately.

The Data Dictionary is more technical, providing detailed specifications about data elements and attributes used in databases. A data dictionary is a centralized repository or document that contains detailed information about the data elements, attributes, and structures used in a database or data management system. It serves as a reference guide for data professionals, database administrators, developers, and other stakeholders involved in managing and using data.

The Data Catalog serves as a repository for metadata about data assets, facilitating data discovery and understanding of available data resources. A data Catalog is a searchable inventory of an organization’s data assets. It provides information about the data, such as its location, format, schema, and business context. This information can be used to help users find, understand, and use the data they need.

Data Glossary, Data Dictionary, Data Catalog

A data dictionary typically includes the following information for each data element:

1. Data Element Name: The name or label of the data element, often corresponding to a field or column name in a database table.
2. Data Type: The type of data that the element can hold, such as text, number, date, or Boolean. It specifies the format and constraints of the data.
3. Description: A detailed description of the data element, explaining its purpose, usage, and any relevant business context.
4. Length/Size: The maximum number of characters or digits that the data element can accommodate. For numeric data types, this may also include precision and scale information.
5. Constraints: Any constraints or rules that apply to the data element, such as unique constraints, primary key status, foreign key relationships, or required fields.
6. Default Value: The default value assigned to the data element if no value is specified during data entry or modification.
7. Data Source: Information about the source or origin of the data element, including the system or process that generates or captures the data.
8. Data Ownership: The person or team responsible for managing and maintaining the data element, including data stewardship and data quality responsibilities.
9. Usage Notes: Additional notes, guidelines, or comments that provide insights into the data element’s usage or special considerations.
10. Dependencies: Information about any dependencies or relationships with other data elements or tables within the database or data system.
11. History and Changes: A log of changes made to the data element, including previous values, dates of modification, and reasons for changes. This helps maintain an audit trail of data modifications.
12. Data Lineage: Information about how the data element is transformed, used, or flows within the organization’s data ecosystem. Data lineage helps track data movement and transformations.

A data glossary and data catalog are two complementary tools that can be used together to improve data governance and data management. A data glossary defines and describes the data used by an organization, while a data catalog is a searchable inventory of the organization’s data assets. The data glossary provides context and meaning to the data catalog, making it easier to find and understand the data.

Example

Data Glossary:
  • Customer ID: A unique identifier for each customer.
  • Order date: The date on which an order was placed.
  • Order amount: The total amount of an order.
  • Shipping address: The address to which an order should be shipped.
Data Dictionary:
  • Customer ID: Data type: integer, Format: 10 digits, Source: CRM system, Target: ERP system, Relationships: one-to-many with Order table.
  • Order date: Data type: date, Format: YYYY-MM-DD, Source: ERP system, Target: Data warehouse, Relationships: one-to-many with Customer table.
  • Order amount: Data type: decimal, Format: 10,2, Source: ERP system, Target: Data warehouse, Relationships: one-to-many with Customer table.
  • Shipping address: Data type: varchar(255), Format: street address, city, state, zip code, Source: CRM system, Target: ERP system, Relationships: one-to-one with Customer table.
Data Catalog:
  • Customer table: This table contains information about customers, such as customer ID, name, address, and contact information.
  • Order table: This table contains information about orders, such as order ID, customer ID, order date, order amount, and shipping address.
  • Product table: This table contains information about products, such as product ID, product name, description, price, and quantity in stock.

https://aws.amazon.com/blogs/big-data/simplify-data-discovery-for-business-users-by-adding-data-descriptions-in-the-aws-glue-data-catalog/