Skip to main content

πŸ— Core Layer

The Core Layer is the main integration and transformation layer β€” where data from multiple sources is deduplicated, enriched, and prepared for reporting and analytics.

Core Layer


Why?​

  • This is where the main business logic, mappings, and data transformations happen β€” errors here impact the entire warehouse and all downstream reporting.
  • Ensures that only high-quality, well-integrated, and business-ready data moves on to semantic or reporting layers.

Where?​

  • The Core Layer is positioned after Staging and before Semantic/Reporting layers.

Methods​

1. Data Completeness​

Purpose: Ensures all expected and required records are present after integration, mapping, and transformation.
Why it matters: Missing records can signal data loss during ETL, incorrect joins, or filter logic errors. Missing data here means incomplete analytics and potentially wrong business decisions.
Example:

SELECT * 
FROM core_orders
WHERE important_flag IS NULL;

2. Data Uniqueness​

Purpose: Confirms that fields meant to be unique identifiers (e.g., customer_id, transaction_number) are truly unique after transformation and integration.
Why it matters: Duplicates can distort KPIs, cause double-counting, or break referential integrity, often appearing after merging multiple sources.
Example:

SELECT customer_id, COUNT(*) 
FROM core_customers
GROUP BY customer_id
HAVING COUNT(*) > 1;

3. Referential Integrity​

Purpose: Ensures all foreign keys reference existing primary keys.
Why it matters: Broken referential integrity causes failed joins, orphaned records, and incorrect analytics.
Example:

SELECT order_id 
FROM core_orders
WHERE customer_id NOT IN (
SELECT customer_id FROM core_customers
);

4. Record Count​

Purpose: Compares the number of records between layers (e.g., Staging β†’ Core).
Why it matters: Large differences may signal join/filter issues or ETL bugs. Counts should remain consistent unless business rules explain changes.
Example:

SELECT COUNT(*) FROM staging_orders;
SELECT COUNT(*) FROM core_orders;

5. Attribute Mapping​

Purpose: Ensures attributes (e.g., statuses, categories) are mapped according to business rules and reference tables.
Why it matters: Incorrect mappings cause misclassification, broken business logic, and reporting errors.
Example:

SELECT * 
FROM core_orders
WHERE order_status NOT IN (
SELECT status FROM ref_statuses
);

6. Delta Processing​

Purpose: Validates that only new or changed data is processed in incremental ETL loads.
Why it matters: Incorrect delta logic leads to missing updates or duplicate records.
Example:

SELECT * 
FROM core_delta_load
WHERE load_timestamp > (
SELECT MAX(load_timestamp) FROM previous_loads
);

7. History Handling​

Purpose: Checks that historical changes (e.g., Slowly Changing Dimensions) are tracked correctly with valid date ranges.
Why it matters: Poor history handling causes inaccurate historical analytics and audit issues.
Example:

SELECT * 
FROM core_customer_history
WHERE valid_to < valid_from;

8. History Start and End Dates​

Purpose: Ensures all versioned records have valid and non-null date ranges.
Why it matters: Incorrect date ranges prevent accurate reconstruction of historical states.
Example:

SELECT * 
FROM core_customer_history
WHERE valid_from IS NULL OR valid_to IS NULL;

9. Full Dump Processing​

Purpose: Validates that full load jobs correctly reload all data into the Core Layer.
Why it matters: Faulty full loads can miss records or introduce duplicates, making Core inconsistent with upstream layers.
Example:

SELECT COUNT(*) FROM full_load_table;