Skip to main content

πŸ“š Reference Tables

Reference Tables map various identifiers and data types (like 🌍 region, 🏷️ brand, πŸ“Š status, πŸͺ dealer) and are essential for correct data integration, attribute mapping, and meaningful reporting throughout your data platform.

Reference Tables


πŸ’‘ Why?​

  • βœ… Enable correct mapping, filtering, and grouping in reporting by providing controlled vocabularies and dictionaries.
  • πŸ›‘οΈ Ensure label, dictionary, and hierarchy consistency across all layers and systems, making it possible to translate codes or IDs into business-friendly names and groups.
  • πŸ—οΈ Serve as a backbone for attribute mapping, transformations, and business logic, ensuring every fact or transaction can be described with relevant, up-to-date reference data.

πŸ“ Where?​

  • Most often in the πŸ—„οΈ Staging Layer (when raw codes are translated), but also in πŸ› οΈ Core for further mappings, enrichment, or business logic application.

πŸ› οΈ Methods​

1. πŸ”‘ Table Uniqueness​

Checks for duplicate key values (e.g., duplicate region codes).
πŸ“œ Description: This check verifies that each identifier in a reference table (such as a region code, status code, or brand ID) is unique. Duplicate keys in reference tables can cause incorrect joins, misreporting, or failed data integrations.
⚠️ Why is this important? If reference values are not unique, facts and dimensions that join to them may link incorrectly, leading to wrong groupings, incorrect aggregations, and possible loss of data integrity throughout all layers.

πŸ’» Example:

SELECT region_code, COUNT(*)
FROM ref_region
GROUP BY region_code
HAVING COUNT(*) > 1

2. πŸ“‹ Table Completeness​

Checks if all required records exist (e.g., full list of countries or brands).
πŸ“œ Description: This method ensures your reference table contains all necessary entries as dictated by business rules, regulatory requirements, or master data governance. Missing reference values can lead to unmapped or unclassified data, which may be excluded from reporting or analysis.
⚠️ Why is this important? Data completeness in reference tables guarantees every fact or transaction can be correctly categorized, reported, and analyzed. For example, if a country code is missing, sales in that country might go unreported or trigger errors in BI tools.

πŸ’» Example:

SELECT * FROM required_regions
WHERE code NOT IN (SELECT region_code FROM ref_region)

3. 🧩 Check​

Any custom validation (e.g., parent_id points to a valid region).
πŸ“œ Description: Custom checks validate the business logic, relationships, or integrity rules specific to your reference data. Typical examples include checking that parent-child hierarchies are valid, or that all referenced IDs exist within the same table (self-references, like parent regions).
⚠️ Why is this important? Broken references or hierarchy issues can result in failed drilldowns, errors in reporting, or broken navigation structures in your BI solution. Ensuring parent-child integrity is especially critical for multi-level groupings or nested reporting.

πŸ’» Example:

SELECT * FROM ref_region
WHERE parent_id IS NOT NULL
AND parent_id NOT IN (SELECT region_id FROM ref_region)