Documentation Index
Fetch the complete documentation index at: https://openmetadata-feat-feat-2mbfixtestexui.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Advanced Usage
This guide covers advanced patterns and configurations for Data Quality as Code, including loading tests from YAML files, customizing workflow configurations, and integrating with production systems.
Loading Tests from YAML
You can load test definitions from YAML workflow files, enabling version-controlled test configurations:
Basic YAML Loading
from metadata.sdk.data_quality import TestRunner
# Load from YAML file
runner = TestRunner.from_yaml(file_path="tests/customer_quality.yaml")
# Or from YAML string
yaml_config = """
source:
type: TestSuite
serviceName: local_postgres
sourceConfig:
config:
type: TestSuite
entityFullyQualifiedName: Postgres.warehouse.public.customers
processor:
type: orm-test-runner
config:
testCases:
- name: customer_email_not_null
testDefinitionName: columnValuesToBeNotNull
columnName: email
- name: customer_id_unique
testDefinitionName: columnValuesToBeUnique
columnName: customer_id
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: openmetadata
securityConfig:
jwtToken: your-token-here
"""
runner = TestRunner.from_yaml(yaml_string=yaml_config)
# Run the loaded tests
results = runner.run()
By default, from_yaml() uses the connection configured via configure(). To use the connection from the YAML file:
runner = TestRunner.from_yaml(
file_path="tests/config.yaml",
use_connection_from_yaml=True
)
YAML File Structure
A complete YAML configuration includes:
source:
type: TestSuite
serviceName: postgres_production
sourceConfig:
config:
type: TestSuite
entityFullyQualifiedName: Postgres.analytics.public.user_events
processor:
type: orm-test-runner
config:
forceUpdate: false
testCases:
# Table-level tests
- name: table_row_count_validation
testDefinitionName: tableRowCountToBeBetween
parameterValues:
- name: minValue
value: "10000"
- name: maxValue
value: "1000000"
# Column-level tests
- name: user_id_not_null
testDefinitionName: columnValuesToBeNotNull
columnName: user_id
- name: event_timestamp_format
testDefinitionName: columnValuesToMatchRegex
columnName: event_timestamp
parameterValues:
- name: regex
value: "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z$"
workflowConfig:
loggerLevel: INFO
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: openmetadata
securityConfig:
jwtToken: ${OPENMETADATA_JWT_TOKEN}
Advanced TestRunner Configuration
Customizing Workflow Behavior
from metadata.sdk.data_quality import TestRunner
from metadata.generated.schema.metadataIngestion.workflow import LogLevels
runner = TestRunner.for_table("BigQuery.analytics.events.user_sessions")
# Configure detailed settings
runner.setup(
force_test_update=True, # Update existing test definitions
log_level=LogLevels.DEBUG, # Enable debug logging
raise_on_error=False, # Continue on errors
success_threshold=95, # Require 95% success rate
enable_streamable_logs=True # Stream logs in real-time
)
# Add tests and run
runner.add_test(TableRowCountToBeBetween(min_count=1000))
results = runner.run()
Accessing Test Definitions
Inspect configured tests before running:
runner = TestRunner.for_table("MySQL.ecommerce.public.orders")
runner.add_tests(
TableRowCountToBeBetween(min_count=100),
ColumnValuesToBeNotNull(column="order_id")
)
# Access test definitions
for test_def in runner.test_definitions:
print(f"Test: {test_def.testDefinitionName}")
print(f"Parameters: {test_def.parameterValues}")
Next Steps
Publishing Results & Best Practices
Publish results to OpenMetadata, implement error handling, generate tests dynamically, and apply production best practices.