π DQ Checks
Data Quality checks confirm that your data meets the expected business and technical rules at every layer of the warehouse. Commonly referred as Data Validation, Data Integrity, or Data Consistency, Data Quality checks are a critical part of data management and ensure that your data is accurate, reliable, and consistent. These checks cover a wide spectrum of validation needs, including:
- Null and missing values
- Mathematical conditions
- Allowed values
- Numeric and text ranges
- Uniqueness constraints
- Text patterns (LIKE / NOT LIKE, regular expressions)
- Date validations
- Suspicious zero values
- Text length checks
- Pre- and post-transformation validations

Why?β
- Prevents reporting errors, analytical inconsistencies, and loss of trust in data.
- Detects anomalies and issues early at every stage of processing.
Where?β
- Applicable at every layer β from landing, through core, to semantic and reporting.
Data Quality Checksβ
Below are the most commonly used Data Quality checks in X-AutoMate, grouped by category.
π Emptinessβ
1. Null Countβ
Counts the number of records with a NULL value in a given column.
Description:
Identifies missing, incomplete, or improperly loaded data. Especially important for mandatory fields like customer names, transaction amounts, or key business dates.
Why important?
Missing values can cause wrong conclusions, break business processes, or distort KPIs.
Example:
SELECT COUNT(*)
FROM customer
WHERE first_name IS NULL;
π’ Numericβ
1.Null Countβ
Counts missing numeric values (NULL values in integer, decimal, or floating-point columns).
Example:
SELECT COUNT(*)
FROM orders
WHERE order_amount IS NULL;
2. Mathβ
Validates numeric values using operators (<, >, =, etc.) to enforce logical or business rules.
Example:
SELECT *
FROM employee
WHERE age < 18;
3. Allowed Valuesβ
Ensures numeric values are only from a predefined set.
Example:
SELECT *
FROM orders
WHERE status NOT IN (1, 2, 3);
4. Value Rangeβ
Checks if numeric values are within a defined range.
Example:
SELECT *
FROM products
WHERE price NOT BETWEEN 1 AND 9999;
5. Zeroes Countβ
Counts values equal to zero.
Example:
SELECT COUNT(*)
FROM invoice
WHERE amount = 0;
6. Uniquenessβ
Detects duplicate values in a numeric column.
Example:
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 1;
π DateTimeβ
3. Allowed Valuesβ
Checks if timestamps are from a valid predefined set.
Example:
SELECT *
FROM events
WHERE event_time NOT IN ('2024-01-01', '2024-12-31');
4. Value Rangeβ
Validates timestamps are within a logical range.
Example:
SELECT *
FROM orders
WHERE order_date NOT BETWEEN '2024-01-01' AND CURRENT_DATE;
1. Null Countβ
Detects missing timestamps.
Example:
SELECT COUNT(*)
FROM visits
WHERE visit_time IS NULL;
6. Uniquenessβ
Finds duplicate timestamps.
Example:
SELECT event_time, COUNT(*)
FROM logs
GROUP BY event_time
HAVING COUNT(*) > 1;
π·οΈ Categoricalβ
3. Allowed Valuesβ
Ensures string values match a valid list.
Example:
SELECT *
FROM customer
WHERE country NOT IN ('PL', 'NO', 'SE');
6. Uniquenessβ
Finds duplicate strings in a column.
Example:
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
7. Missing Valuesβ
Detects empty or NULL string values.
Example:
SELECT *
FROM customer
WHERE TRIM(first_name) = '' OR first_name IS NULL;
8. Length Rangeβ
Checks if string length is within limits.
Example:
SELECT *
FROM customer
WHERE LENGTH(last_name) NOT BETWEEN 2 AND 30;
9. LIKE / NOT LIKEβ
Validates values match expected patterns.
Example:
SELECT *
FROM products
WHERE name NOT LIKE 'PROD%';
10. Regular Expressionβ
Validates values with regex.
Example:
SELECT *
FROM users
WHERE email NOT REGEXP '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$';
β Using these DQ checks ensures high data quality across all stages β from loading to reporting.