Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openmetadata-feat-feat-2mbfixtestexui.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Test Definitions Reference

This page provides a complete reference for all data quality test definitions available in the OpenMetadata Python SDK. Tests are organized into two categories: Table-Level Tests and Column-Level Tests.

Importing Test Definitions

All test definitions are available from the metadata.sdk.data_quality module:
from metadata.sdk.data_quality import (
    # Table tests
    TableRowCountToBeBetween,
    TableColumnCountToBeBetween,
    TableCustomSQLQuery,

    # Column tests
    ColumnValuesToBeNotNull,
    ColumnValuesToBeUnique,
    ColumnValuesToBeBetween,
)

Common Parameters

All test definitions support these optional parameters:
ParameterTypeDescription
namestrUnique identifier for the test case
display_namestrHuman-readable name shown in UI
descriptionstrDetailed description of what the test validates
Column tests additionally require:
ParameterTypeDescription
columnstrName of the column to test (required)

Table-Level Tests

Table-level tests validate properties of entire tables, such as row counts, column counts, or custom SQL queries.

TableRowCountToBeBetween

Validates that the number of rows in a table falls within a specified range. Parameters:
  • min_count (int, optional): Minimum acceptable number of rows
  • max_count (int, optional): Maximum acceptable number of rows
Example:
from metadata.sdk.data_quality import TableRowCountToBeBetween

# Table should have between 1,000 and 100,000 rows
test = TableRowCountToBeBetween(min_count=1000, max_count=100000)

# At least 1,000 rows (no maximum)
test = TableRowCountToBeBetween(min_count=1000)

# At most 50,000 rows (no minimum)
test = TableRowCountToBeBetween(max_count=50000)
Use Cases:
  • Monitor data volume and detect data loss
  • Validate expected data growth patterns
  • Detect unexpected data surges

TableRowCountToEqual

Validates that the table has an exact number of rows. Parameters:
  • row_count (int, required): Expected number of rows
Example:
from metadata.sdk.data_quality import TableRowCountToEqual

# Table must have exactly 50 rows
test = TableRowCountToEqual(row_count=50)
Use Cases:
  • Validate fixed-size reference tables
  • Ensure complete dimension table loads
  • Verify static lookup tables

TableColumnCountToBeBetween

Validates that the number of columns in a table falls within a specified range. Parameters:
  • min_count (int, optional): Minimum acceptable number of columns
  • max_count (int, optional): Maximum acceptable number of columns
Example:
from metadata.sdk.data_quality import TableColumnCountToBeBetween

# Table should have between 5 and 20 columns
test = TableColumnCountToBeBetween(min_count=5, max_count=20)
Use Cases:
  • Schema validation
  • Detect unexpected column additions or removals
  • Monitor schema evolution

TableColumnCountToEqual

Validates that the table has an exact number of columns. Parameters:
  • column_count (int, required): Expected number of columns
Example:
from metadata.sdk.data_quality import TableColumnCountToEqual

# Table must have exactly 10 columns
test = TableColumnCountToEqual(column_count=10)
Use Cases:
  • Strict schema validation
  • Ensure schema stability
  • Prevent schema drift

TableColumnNameToExist

Validates that a specific column exists in the table schema. Parameters:
  • column_name (str, required): Name of the column that must exist
Example:
from metadata.sdk.data_quality import TableColumnNameToExist

# Ensure 'customer_id' column exists
test = TableColumnNameToExist(column_name="customer_id")
Use Cases:
  • Verify required columns are present
  • Ensure critical columns aren’t dropped
  • Validate schema migrations

TableColumnToMatchSet

Validates that table columns match an expected set of column names. Parameters:
  • column_names (list[str], required): List of expected column names
  • ordered (bool, optional): If True, column order must match exactly (default: False)
Example:
from metadata.sdk.data_quality import TableColumnToMatchSet

# Columns should match this set (any order)
test = TableColumnToMatchSet(
    column_names=["id", "name", "email", "created_at"]
)

# Columns must match in exact order
test = TableColumnToMatchSet(
    column_names=["id", "name", "email"],
    ordered=True
)
Use Cases:
  • Validate complete schema structure
  • Ensure schema consistency across environments
  • Detect unexpected schema changes

TableRowInsertedCountToBeBetween

Validates that the number of rows inserted within a time range is within bounds. Parameters:
  • min_count (int, optional): Minimum acceptable number of inserted rows
  • max_count (int, optional): Maximum acceptable number of inserted rows
  • range_type (str, optional): Time unit (“HOUR”, “DAY”, “WEEK”, “MONTH”) (default: “DAY”)
  • range_interval (int, optional): Number of time units to look back (default: 1)
Example:
from metadata.sdk.data_quality import TableRowInsertedCountToBeBetween

# 100-1000 rows inserted in the last 24 hours
test = TableRowInsertedCountToBeBetween(
    min_count=100,
    max_count=1000,
    range_type="DAY",
    range_interval=1
)

# At least 50 rows inserted in the last 6 hours
test = TableRowInsertedCountToBeBetween(
    min_count=50,
    range_type="HOUR",
    range_interval=6
)
Use Cases:
  • Monitor data ingestion rates
  • Detect ingestion pipeline failures
  • Validate ETL job completions

TableCustomSQLQuery

Validates data using a custom SQL query expression. Parameters:
  • sql_expression (str, required): SQL query to execute
  • strategy (str, optional): “ROWS” (count failing rows) or “COUNT” (expect a count) (default: “ROWS”)
Example:
from metadata.sdk.data_quality import TableCustomSQLQuery

# Find negative prices (returns failing rows)
test = TableCustomSQLQuery(
    sql_expression="SELECT * FROM {table} WHERE price < 0",
    strategy="ROWS"
)

# Count orphaned records
test = TableCustomSQLQuery(
    sql_expression="""
        SELECT COUNT(*) FROM {table} t
        LEFT JOIN parent_table p ON t.parent_id = p.id
        WHERE p.id IS NULL
    """,
    strategy="COUNT"
)
Use Cases:
  • Implement custom business logic validation
  • Validate referential integrity
  • Check complex data relationships

TableDiff

Compares two tables and identifies differences in their data. Parameters:
  • table2 (str, required): Fully qualified name of the comparison table
  • key_columns (list[str], optional): Columns to use as join keys
  • table2_key_columns (list[str], optional): Join key columns from table 2
  • use_columns (list[str], optional): Specific columns to compare
  • extra_columns (list[str], optional): Additional columns to include in output
  • table2_extra_columns (list[str], optional): Additional columns from table 2
Example:
from metadata.sdk.data_quality import TableDiff

# Compare tables using 'id' as key
test = TableDiff(
    table2="Postgres.warehouse.staging.reference_customers",
    key_columns=["id"],
    use_columns=["name", "email", "status"]
)

# Compare with different key columns
test = TableDiff(
    table2="MySQL.prod.db.legacy_users",
    key_columns=["customer_id"],
    table2_key_columns=["user_id"],
    use_columns=["name", "email"]
)
Use Cases:
  • Validate data migrations
  • Verify data replication
  • Compare production vs staging data

For column-level tests (null checks, uniqueness, regex, ranges, and more), see Column-Level Test Definitions.