Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openmetadata-feat-feat-2mbfixtestexui.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Openmetadata Deployment on Azure Kubernetes Service Cluster

Openmetadata can be deployed on Azure Kubernetes Service. It however requires certain cloud specific configurations with regards to setting up storage accounts for Airflow which is one of its dependencies.

Prerequisites

Azure Services for Database and Search Engine as Elastic Cloud

It is recommended to use Azure SQL and Elastic Cloud on Azure for Production Deployments. We support
  • Azure SQL (MySQL) engine version 8 or higher
  • Azure SQL (PostgreSQL) engine version 12 or higher
  • Elastic Cloud (ElasticSearch version 8.11.4)
Once you have the Azure SQL and Elastic Cloud on Azure configured, you can update the environment variables below for OpenMetadata kubernetes deployments to connect with Database and ElasticSearch.
# openmetadata-values.prod.yaml
...
openmetadata:
  config:
    elasticsearch:
      host: <ELASTIC_CLOUD_ENDPOINT_WITHOUT_HTTPS>
      searchType: elasticsearch
      port: 443
      scheme: https
      connectionTimeoutSecs: 5
      socketTimeoutSecs: 60
      keepAliveTimeoutSecs: 600
      batchSize: 10
      auth:
        enabled: true
        username: <ELASTIC_CLOUD_USERNAME>
        password:
          secretRef: elasticsearch-secrets
          secretKey: openmetadata-elasticsearch-password
    database:
      host: <AZURE_SQL_ENDPOINT>
      port: 3306
      driverClass: com.mysql.cj.jdbc.Driver
      dbScheme: mysql
      dbUseSSL: true
      databaseName: <AZURE_SQL_DATABASE_NAME>
      auth:
        username: <AZURE_SQL_DATABASE_USERNAME>
        password:
          secretRef: mysql-secrets
          secretKey: openmetadata-mysql-password
  ...
We recommend -
  • Azure SQL to be Multi Zone Available and Production Workload Environment
  • Elastic Cloud Environment with multiple zones and minimum 2 nodes
Make sure to create database and elastic cloud credentials as Kubernetes Secrets mentioned here. Also, disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts as mentioned in the FAQs here.

Step 1 - Create a AKS cluster

If you are deploying on a new cluster set the EnableAzureDiskFileCSIDriver=true to enable container storage interface storage drivers.
az aks create   --resource-group  MyResourceGroup    \
                --name MyAKSClusterName              \
                --nodepool-name agentpool            \
                --outbound-type loadbalancer         \
                --location YourPreferredLocation        \
                --generate-ssh-keys                  \
		        --enable-addons monitoring           \
		          EnableAzureDiskFileCSIDriver=true  \

For existing cluster it is important to enable the CSI storage drivers
az aks update -n MyAKSCluster -g MyResourceGroup --enable-disk-driver --enable-file-driver

Step 2 - Create a Namespace (optional)

kubectl create namespace openmetadata

Step 3 - Create Persistent Volumes

OpenMetadata helm chart depends on Airflow and Airflow expects a persistent disk that support ReadWriteMany (the volume can be mounted as read-write by many nodes). The Azure CSI storage drivers we enabled earlier support the provisioning of the disks in ReadWriteMany mode,.
# logs_dags_pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: openmetadata-dependencies-dags-pvc
  namespace: openmetadata
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: azurefile-csi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: openmetadata-dependencies-logs-pvc
  namespace: openmetadata
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: azurefile-csi
Create the volume claims by applying the manifest.
kubectl apply -f logs_dags_pvc.yaml

Step 4 - Change owner and update permission for persistent volumes

Airflow pods run as non-root user and lack write access to our persistent volumes. To fix this we create a job permissions_pod.yaml that runs a pod that mounts volumnes into the persistent volume claim and updates the owner of the mounted folders /airflow-dags and /airflow-logs to user id 5000, which is the default linux user id of Airflow pods.
# permissions_pod.yaml
apiVersion: batch/v1
kind: Job
metadata:
  labels:
    run: my-permission-pod
  name: my-permission-pod
  namespace: openmetadata
spec:
  template:
    spec:
      containers:
      - image: busybox
        name: my-permission-pod
        volumeMounts:
        - name: airflow-dags
          mountPath: /airflow-dags
        - name: airflow-logs
          mountPath: /airflow-logs
        command: ["/bin/sh", "-c", "chown -R 50000 /airflow-dags /airflow-logs", "chmod -R a+rwx /airflow-dags"]
      restartPolicy: Never
      volumes:
      - name: airflow-logs
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-logs-pvc
      - name: airflow-dags
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-dags-pvc
Start the job by applying the manifest in permissions_pod.yaml.
kubectl apply -f permissions_pod.yaml