๐ DQ Dimensions
Data Quality Dimensions represent the fundamental perspectives from which the health and fitness of data are assessed. They provide a structured vocabulary for identifying, categorizing, and addressing quality issues, ensuring that discussions about โdata qualityโ are precise and measurable rather than abstract.
In our UI, each dimension acts as a category under which DQ checks and measurements are aggregated, allowing both technical and business stakeholders to interpret results in the same context. For example, a โCompletenessโ percentage in the dashboard doesnโt just mean โsome values are missingโ โ it reflects all tests related to missing fields, missing records, or incomplete reference datasets, grouped together under a business-relevant quality perspective.
The seven dimensions tracked in the UI โ โ Correctness, ๐ Consistency, โฑ Timeliness, ๐ฏ Accuracy, ๐ฆ Completeness, ๐ Validity, and ๐ Integrity โ form the backbone of our quality monitoring process.
โ
โ Correctnessโ
Definition: Measures whether data values are correct for their intended business context, based on trusted sources or predefined business rules. Correctness is about the truthfulness of data in relation to real-world values.
Why it matters: Incorrect data leads to operational errors, flawed decisions, compliance violations, and customer dissatisfaction.
Examples:
- ๐ฐ Product price in the database matches the official price list published by the finance department.
- ๐ Country codes in the system match ISO-3166 standard values.
- ๐ Customer postal code correctly corresponds to their registered city.
Note: Some frameworks group Correctness under Accuracy; in our UI, it is tracked separately for better granularity.
๐ Consistencyโ
Definition: Ensures that data remains uniform and free from contradictions when compared across different datasets, systems, or time periods.
Why it matters: Inconsistent data causes reporting mismatches, breaks integrations, and reduces confidence in analytics.
Examples:
- ๐ The
total_amountin theOrderstable matches the sum of line item amounts inOrderLines. - ๐งพ Gender is stored as
M/Fin one database andMale/Femalein another โ these must be harmonized. - ๐ A customerโs address is stored differently in CRM and billing systems.
โฑ Timelinessโ
Definition: Measures whether data is available within the required timeframes defined by SLAs or business rules.
Why it matters: Even accurate data loses value if it is not available when needed for decision-making or automated processes.
Examples:
- ๐ Daily sales transactions must be available by 6:00 AM to feed morning business dashboards.
- ๐ก Real-time sensor readings must be ingested within 5 seconds to trigger safety systems.
- ๐ Monthly reports must be generated before the 5th business day of the next month.
๐ฏ Accuracyโ
Definition: Measures how closely data values match actual real-world facts or a reliable reference source.
Why it matters: Inaccurate data erodes trust and can cause costly errors in operations, planning, and compliance.
Examples:
- ๐ Customerโs date of birth in the system matches their government-issued ID.
- ๐ Recorded GPS coordinates match the actual physical location of a store.
- ๐ The CPI (Consumer Price Index) stored matches the official figure from the statistical office.
๐ฆ Completenessโ
Definition: Evaluates whether all required data is present โ including full datasets, mandatory fields, and all relevant reference values.
Why it matters: Missing information reduces analytical reliability and can stop business processes from running.
Examples:
- ๐ซ A sales order is missing the
customer_idโ preventing invoice generation. - ๐ The
statusreference table lacks an expected value like โSuspendedโ. - ๐ง Customer records without email addresses make it impossible to send notifications.
๐ Validityโ
Definition: Checks whether data values conform to the required format, range, or domain specified by business rules, data models, or regulations.
Why it matters: Invalid data can cause process failures, calculation errors, or misclassification in analytics.
Examples:
- ๐ฎ Email addresses must match the pattern
^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$. - ๐ The
agefield must be between 18 and 120. - ๐ Order status must be one of
('NEW', 'PROCESSING', 'SHIPPED', 'CANCELLED').
๐ Integrityโ
Definition: Ensures that relationships between data entities follow defined constraints โ typically referential integrity (foreign key relationships) and cardinality rules.
Why it matters: Missing or incorrect links create orphaned records, break joins in reporting, and distort aggregated metrics.
Examples:
- ๐ Every
order.customer_idmust exist in thecustomers.customer_idtable. - ๐ No employee can have more than one active ID badge when the relationship is 1:1.
- ๐ Product records must link to an existing category in the
categoriestable.
๐ Mapping of DQ Dimensions to X-AutoMate DQ Checks
This mapping links the Data Quality Dimensions used in our UI to the corresponding X-AutoMate DQ Check categories.
It ensures that technical tests are always aligned with the business meaning of each dimension.
| DQ Dimension | Relevant X-AutoMate DQ Checks (Category) | Purpose / Examples |
|---|---|---|
| โ Correctness | - Numeric โ Value Range- DateTime โ Value Range- Categorical โ Cross-table Match- Custom SQL Validation | Verifies correctness against trusted sources.Examples: product_price matches price list, postal_code matches city, VAT rate matches legal rate. |
| ๐ Consistency | - Categorical โ Allowed Values- Numeric โ Allowed Values- Core Layer โ Attribute Mapping | Prevents contradictions across systems.Examples: order_total matches sum of items, status codes match across CRM/ERP. |
| โฑ Timeliness | - DateTime โ Allowed Values- Max / Min Date Checks | Ensures SLA compliance.Examples: data updated daily, ETL loads before 6 AM, sensor data ingested within seconds. |
| ๐ฏ Accuracy | - Numeric โ Value Range- DateTime โ Value Range- Cross-check with External Data | Confirms closeness to real-world values.Examples: exchange_rate matches central bank, GPS within tolerance, birth_date matches ID. |
| ๐ฆ Completeness | - Null Count Checks- Missing Values- Row Count vs. Expected Count | Ensures no missing data.Examples: all customers have IDs, all statuses are present, all branches report sales. |
| ๐ Validity | - Regex Checks- LIKE / NOT LIKE- Length Range | Validates format/range/domain.Examples: email regex, age range, valid status list. |
| ๐ Integrity | - Referential Integrity Checks- Cross-table Match | Ensures valid relationships.Examples: all orders link to customers, no orphaned children, valid category IDs. |
Notes:
- Each test result is tagged with its mapped dimension so dashboard percentages reflect the correct quality aspect.
- Additional checks may also align with these dimensions depending on business rules.