Skip to main content

๐Ÿ“Š DQ Dimensions

Data Quality Dimensions represent the fundamental perspectives from which the health and fitness of data are assessed. They provide a structured vocabulary for identifying, categorizing, and addressing quality issues, ensuring that discussions about โ€œdata qualityโ€ are precise and measurable rather than abstract.

In our UI, each dimension acts as a category under which DQ checks and measurements are aggregated, allowing both technical and business stakeholders to interpret results in the same context. For example, a โ€œCompletenessโ€ percentage in the dashboard doesnโ€™t just mean โ€œsome values are missingโ€ โ€” it reflects all tests related to missing fields, missing records, or incomplete reference datasets, grouped together under a business-relevant quality perspective.

The seven dimensions tracked in the UI โ€” โœ… Correctness, ๐Ÿ”„ Consistency, โฑ Timeliness, ๐ŸŽฏ Accuracy, ๐Ÿ“ฆ Completeness, ๐Ÿ“ Validity, and ๐Ÿ”— Integrity โ€” form the backbone of our quality monitoring process.

DQ Dimensionsโ€‹

โœ… Correctnessโ€‹

Definition: Measures whether data values are correct for their intended business context, based on trusted sources or predefined business rules. Correctness is about the truthfulness of data in relation to real-world values.
Why it matters: Incorrect data leads to operational errors, flawed decisions, compliance violations, and customer dissatisfaction.
Examples:

  • ๐Ÿ’ฐ Product price in the database matches the official price list published by the finance department.
  • ๐ŸŒ Country codes in the system match ISO-3166 standard values.
  • ๐Ÿ™ Customer postal code correctly corresponds to their registered city.

Note: Some frameworks group Correctness under Accuracy; in our UI, it is tracked separately for better granularity.


๐Ÿ”„ Consistencyโ€‹

Definition: Ensures that data remains uniform and free from contradictions when compared across different datasets, systems, or time periods.
Why it matters: Inconsistent data causes reporting mismatches, breaks integrations, and reduces confidence in analytics.
Examples:

  • ๐Ÿ“Š The total_amount in the Orders table matches the sum of line item amounts in OrderLines.
  • ๐Ÿงพ Gender is stored as M/F in one database and Male/Female in another โ€” these must be harmonized.
  • ๐Ÿ  A customerโ€™s address is stored differently in CRM and billing systems.

โฑ Timelinessโ€‹

Definition: Measures whether data is available within the required timeframes defined by SLAs or business rules.
Why it matters: Even accurate data loses value if it is not available when needed for decision-making or automated processes.
Examples:

  • ๐Ÿ“… Daily sales transactions must be available by 6:00 AM to feed morning business dashboards.
  • ๐Ÿ“ก Real-time sensor readings must be ingested within 5 seconds to trigger safety systems.
  • ๐Ÿ—“ Monthly reports must be generated before the 5th business day of the next month.

๐ŸŽฏ Accuracyโ€‹

Definition: Measures how closely data values match actual real-world facts or a reliable reference source.
Why it matters: Inaccurate data erodes trust and can cause costly errors in operations, planning, and compliance.
Examples:

  • ๐Ÿ†” Customerโ€™s date of birth in the system matches their government-issued ID.
  • ๐Ÿ“ Recorded GPS coordinates match the actual physical location of a store.
  • ๐Ÿ“ˆ The CPI (Consumer Price Index) stored matches the official figure from the statistical office.

๐Ÿ“ฆ Completenessโ€‹

Definition: Evaluates whether all required data is present โ€” including full datasets, mandatory fields, and all relevant reference values.
Why it matters: Missing information reduces analytical reliability and can stop business processes from running.
Examples:

  • ๐Ÿšซ A sales order is missing the customer_id โ€” preventing invoice generation.
  • ๐Ÿ“‹ The status reference table lacks an expected value like โ€œSuspendedโ€.
  • ๐Ÿ“ง Customer records without email addresses make it impossible to send notifications.

๐Ÿ“ Validityโ€‹

Definition: Checks whether data values conform to the required format, range, or domain specified by business rules, data models, or regulations.
Why it matters: Invalid data can cause process failures, calculation errors, or misclassification in analytics.
Examples:

  • ๐Ÿ“ฎ Email addresses must match the pattern ^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$.
  • ๐ŸŽ‚ The age field must be between 18 and 120.
  • ๐Ÿšš Order status must be one of ('NEW', 'PROCESSING', 'SHIPPED', 'CANCELLED').

๐Ÿ”— Integrityโ€‹

Definition: Ensures that relationships between data entities follow defined constraints โ€” typically referential integrity (foreign key relationships) and cardinality rules.
Why it matters: Missing or incorrect links create orphaned records, break joins in reporting, and distort aggregated metrics.
Examples:

  • ๐Ÿ›’ Every order.customer_id must exist in the customers.customer_id table.
  • ๐Ÿ†” No employee can have more than one active ID badge when the relationship is 1:1.
  • ๐Ÿ—‚ Product records must link to an existing category in the categories table.

๐Ÿ“Œ Mapping of DQ Dimensions to X-AutoMate DQ Checks

This mapping links the Data Quality Dimensions used in our UI to the corresponding X-AutoMate DQ Check categories.
It ensures that technical tests are always aligned with the business meaning of each dimension.

DQ DimensionRelevant X-AutoMate DQ Checks (Category)Purpose / Examples
โœ… Correctness- Numeric โ†’ Value Range- DateTime โ†’ Value Range- Categorical โ†’ Cross-table Match- Custom SQL ValidationVerifies correctness against trusted sources.Examples: product_price matches price list, postal_code matches city, VAT rate matches legal rate.
๐Ÿ”„ Consistency- Categorical โ†’ Allowed Values- Numeric โ†’ Allowed Values- Core Layer โ†’ Attribute MappingPrevents contradictions across systems.Examples: order_total matches sum of items, status codes match across CRM/ERP.
โฑ Timeliness- DateTime โ†’ Allowed Values- Max / Min Date ChecksEnsures SLA compliance.Examples: data updated daily, ETL loads before 6 AM, sensor data ingested within seconds.
๐ŸŽฏ Accuracy- Numeric โ†’ Value Range- DateTime โ†’ Value Range- Cross-check with External DataConfirms closeness to real-world values.Examples: exchange_rate matches central bank, GPS within tolerance, birth_date matches ID.
๐Ÿ“ฆ Completeness- Null Count Checks- Missing Values- Row Count vs. Expected CountEnsures no missing data.Examples: all customers have IDs, all statuses are present, all branches report sales.
๐Ÿ“ Validity- Regex Checks- LIKE / NOT LIKE- Length RangeValidates format/range/domain.Examples: email regex, age range, valid status list.
๐Ÿ”— Integrity- Referential Integrity Checks- Cross-table MatchEnsures valid relationships.Examples: all orders link to customers, no orphaned children, valid category IDs.

Notes:

  • Each test result is tagged with its mapped dimension so dashboard percentages reflect the correct quality aspect.
  • Additional checks may also align with these dimensions depending on business rules.