π End-to-End Testing
End-to-End (E2E) tests treat the entire ETL pipeline as a βblack boxβ, verifying that from source to final output, the data behaves exactly as the business expects.

Why Use End-to-End Testing?β
- β Guarantees that data moves through the process with integrity, completeness, and business meaning β regardless of transformations and mappings at each step.
- π Detects errors, data loss, mapping mismatches, or transformation bugs that individual layer checks may miss.
- π Validates the full business process, not just isolated technical steps.
Where Is It Applied?β
- Cross-layer validation:
Comparing data across multiple layers:- Staging β Core
- Core β Semantic
- Semantic β Reporting
- Or even Source Systems β Final Reports
Methodsβ
1. Data Reconciliationβ
Purpose: Compares data between layers or systems to ensure nothing is lost, duplicated, or incorrectly transformed.
Description:
Reconciliation checks can be done using:
- Row counts
- Control totals
- Hashes of concatenated fields
Why is this important?
Even if individual layers pass their tests, the combined effect of transformations, joins, and filters may introduce subtle errors.
Reconciliation gives confidence that final reported numbers match the original source after all business rules are applied.
Example:
SELECT COUNT(*)
FROM (
SELECT md5(concat_ws('|', *)) AS hash
FROM core_data
EXCEPT
SELECT md5(concat_ws('|', *)) AS hash
FROM semantic_data
) AS diff
2. Attribute Mappingβ
Purpose: Ensures attribute values (e.g., statuses, categories) remain consistent through all layers.
Description:
Checks that key attribute values such as:
- Status codes
- Product categories
- Region mappings
are correctly translated from source β core β semantic β reporting layers.
Why is this important?
Inconsistent mapping can cause:
- Reporting errors
- Lost facts
- Business confusion
End-to-end mapping checks ensure traceability and correctness of critical business fields.
Example:
SELECT c.id, c.status, s.status
FROM core_orders c
JOIN semantic_orders s ON c.id = s.id
WHERE c.status <> s.status