Nie możesz wybrać więcej, niż 25 tematów Tematy muszą się zaczynać od litery lub cyfry, mogą zawierać myślniki ('-') i mogą mieć do 35 znaków.
 
 
 
 

12 KiB

Airflow on k8s HA Installation Guide

本文件說明如何在 k8s cluster 中部署高可用的 Airflow。

0. 前置需求

  • Kubernetes: 已部署且運作正常。
  • PostgreSQL: 已部署且運作正常。
  • RabbitMQ: 已部署且運作正常。

1. 基礎配置

1.0 準備 NFS 儲存目錄

在部署 Airflow 之前,必須先在 NFS Server (10.10.0.85) 上建立所需的目錄結構並設定正確權限。

建立 Airflow 目錄結構:

# 建立 dags 和 logs 目錄
sudo mkdir -p /srv/nfs/airflow/{dags,logs}

# 設定權限 (50000:0 是 Airflow 容器內的預設 UID:GID)
sudo chown -R 50000:0 /srv/nfs/airflow
sudo chmod -R 775 /srv/nfs/airflow

# 驗證目錄權限
ls -ld /srv/nfs/airflow/{dags,logs}
# 預期輸出:
# drwxrwxr-x 2 50000 root 4096 Feb  3 19:30 /srv/nfs/airflow/dags
# drwxrwxr-x 2 50000 root 4096 Feb  3 19:30 /srv/nfs/airflow/logs

重新載入 NFS 匯出配置:

# 確認 /etc/exports 包含以下配置:
# /srv/nfs/airflow 10.10.0.0/16(rw,sync,no_subtree_check,no_root_squash)

# 重新載入 NFS 匯出
sudo exportfs -ra

# 驗證匯出狀態
sudo exportfs -v | grep airflow
# 預期輸出:
# /srv/nfs/airflow  10.10.0.0/16(rw,wdelay,no_root_squash,no_subtree_check,...)

重要提醒:

  • NFS 匯出配置 (/etc/exports) 只定義掛載權限,不會自動建立目錄
  • 必須手動建立目錄並設定正確的 UID/GID (50000:0)
  • 確保目錄權限為 775,讓 Airflow 容器可以寫入

1.1 Storage Class 設定

1. 安裝 NFS CSI Driver:

helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update
helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs -n kube-system

2. 建立 StorageClass:

sudo vi nfs-airflow-storage-class.yml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-airflow
provisioner: nfs.csi.k8s.io
parameters:
  server: 10.10.0.85
  share: /srv/nfs/airflow
reclaimPolicy: Retain
volumeBindingMode: Immediate

執行套用:

kubectl apply -f nfs-airflow-storage-class.yml

1.2 建立 Airflow 固定的 PV/PVC

建立 airflow-dags-storage.yml

sudo vi airflow-dags-storage.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: airflow-dags-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /srv/nfs/airflow/dags
    server: 10.10.0.85
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: airflow-dags-pvc
  namespace: airflow
spec:
  storageClassName: nfs-airflow
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

建立 airflow-logs-storage.yml

sudo vi airflow-logs-storage.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: airflow-logs-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /srv/nfs/airflow/logs
    server: 10.10.0.85
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: airflow-logs-pvc
  namespace: airflow
spec:
  storageClassName: nfs-airflow
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

執行套用:

kubectl create namespace airflow
kubectl apply -f nfs-dags.yml
kubectl apply -f airflow-logs-storage.yml

驗證 PV/PVC 狀態:

kubectl get pv | grep airflow
kubectl get pvc -n airflow

1.3 PostgreSQL 設定

1. 建立資料庫與使用者:

連線至資料庫

psql -h 10.10.0.83 -p 5000 -U postgres

執行以下 SQL 指令:

-- 1. 建立使用者 airflow
CREATE USER airflow WITH PASSWORD 'airflow';

-- 2. 建立資料庫 airflow_db 並指定擁有者為 airflow
CREATE DATABASE airflow_db OWNER airflow;

-- 3. (選用) 授予權限
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow;
GRANT ALL PRIVILEGES ON SCHEMA public TO airflow;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO airflow;

-- 授予未來建立的物件權限
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO airflow;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO airflow;

2. 驗證連線:

確認可以使用新帳號連線: psql -h 10.10.0.83 -p 5000 -U airflow -d airflow_db

1.4 Container Registry 設定

設定 Image Pull Secret (若需要):

若 Registry 需要驗證,請建立 Secret 並在 values.yml 中參照。

kubectl create secret docker-registry airflow-registry-secret \
  --docker-server=10.10.0.85:50000 \
  --docker-username=admin \
  --docker-password=<your-password> \
  --namespace airflow

1.5 RabbitMQ 設定

建立 Airflow 專用帳號(若未建立):

建立 airflow-rabbitmq-user.yaml 並套用:

sudo vi airflow-rabbitmq-user.yaml
apiVersion: v1
kind: Secret
metadata:
  name: airflow-rabbitmq-credentials
  namespace: airflow
type: Opaque
stringData:
  username: airflow
  password: airflow
---
apiVersion: rabbitmq.com/v1beta1
kind: User
metadata:
  name: airflow
  namespace: airflow
spec:
  rabbitmqClusterReference:
    name: airflow-rabbitmq-cluster
  tags:
    - management
  credentials:
    secretName: airflow-rabbitmq-credentials
---
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
  name: airflow-permission
  namespace: airflow
spec:
  rabbitmqClusterReference:
    name: airflow-rabbitmq-cluster
  user: airflow
  vhost: /
  permissions:
    configure: ".*"
    write: ".*"
    read: ".*"

執行套用:

kubectl apply -f airflow-rabbitmq-user.yaml

2. 環境設定

2.1 建立 Namespace

為 Airflow 建立獨立的命名空間 (Namespace):

kubectl create namespace airflow # 如果尚未建立

2.2 設定節點標籤 (Node Labels)

根據架構設計,Airflow 的控制元件 (Scheduler, Webserver) 將運行於 Control Plane 節點,而 Worker 運行於 Worker 節點。

Control Plane 節點 (doris-f01 ~ f03):

# 確保這些節點有此標籤
kubectl label node doris-f01 node-role.kubernetes.io/control-plane="" --overwrite
kubectl label node doris-f02 node-role.kubernetes.io/control-plane="" --overwrite
kubectl label node doris-f03 node-role.kubernetes.io/control-plane="" --overwrite

Worker 節點 (doris-b01 ~ b04):

# 確保這些節點有此標籤
kubectl label node doris-b01 role=worker --overwrite
kubectl label node doris-b02 role=worker --overwrite
kubectl label node doris-b03 role=worker --overwrite
kubectl label node doris-b04 role=worker --overwrite

2.3 建立 Kubernetes Secrets(若需要)

為了安全起見,手動建立包含敏感資訊的 Secret,而不是直接寫在 Helm Values 中。

產生 Fernet Key:

python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# 輸出範例: rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs=

建立 Secret:

kubectl create secret generic airflow-secrets \
    --namespace airflow \
    --from-literal=airflow-fernet-key='rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs=' \
    --from-literal=airflow-webserver-secret='this-must-be-a-long-random-string-fixed-for-ha' \
    --from-literal=metadata-connection='postgresql://airflow:airflow@10.10.0.83:5000/airflow_db?sslmode=disable' \
    --from-literal=result-backend-connection='postgresql://airflow:airflow@10.10.0.83:5000/airflow_db?sslmode=disable' \
    --from-literal=broker-url='amqp://airflow:airflow@rabbitmq-cluster.rabbitmq-system.svc.cluster.local:5672//'

3. 建置與推送 Docker Image

由於我們需要安裝 fping 並給予 NET_RAW 權限,必須使用客製化的 Docker Image。

3.1 準備 Dockerfile

sudo vi Dockerfile
FROM apache/airflow:3.0.2
USER root
RUN apt-get update && apt-get install -y --no-install-recommends fping iputils-ping libcap2-bin \
 && setcap cap_net_raw+ep /usr/bin/fping && setcap cap_net_raw+ep /usr/bin/ping \
 && apt-get clean && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install --no-cache-dir ping3==4.0.8

3.2 建置並推送

# 1. 建置 Image
podman build -t 10.10.0.85:50000/airflow-custom:1.0 .

# 2. 推送至 Registry
podman push 10.10.0.85:50000/airflow-custom:1.0

4. 使用 Helm 部署 Airflow

4.1 加入 Airflow Helm Repo

helm repo add apache-airflow https://airflow.apache.org
helm repo update

4.2 準備 Values 檔案

建立 values.yml,內容如下(請務必檢查資料庫與 Broker 連線資訊):

fullnameOverride: "airflow"

useStandardNaming: true

images:
  airflow:
    repository: 10.10.0.85:50000/airflow-custom
    tag: "1.0"
    pullPolicy: Always

executor: "CeleryExecutor"

postgresql:
  enabled: false
redis:
  enabled: false

data:
  metadataConnection:
    user: "airflow"
    pass: "airflow"
    protocol: postgresql
    host: "10.10.0.83"
    port: 5000
    db: "airflow_db"
    sslmode: disable
  brokerUrl: "amqp://airflow:airflow@airflow-rabbitmq-cluster:5672/"
  resultBackendConnection:
    protocol: postgresql
    host: "10.10.0.83"
    port: 5000
    db: "airflow_db"
    user: "airflow"
    pass: "airflow"
    sslmode: disable

migrateDatabaseJob:
  nodeSelector:
    role: worker

webserverSecretKey: "this-must-be-a-long-random-string-fixed-for-ha"
fernetKey: "rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs="

dags:
  persistence:
    enabled: true
    existingClaim: airflow-dags-pvc
logs:
  persistence:
    enabled: true
    existingClaim: airflow-logs-pvc

# ✅ 保留 apiServer 配置(你的環境需要它)
apiServer:
  replicas: 3
  service:
    type: NodePort
    ports:
      - name: airflow-ui
        port: 8080
        nodePort: 30080
  nodeSelector:
    node-role.kubernetes.io/control-plane: ""
  tolerations:
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"

scheduler:
  replicas: 1
  nodeSelector:
    node-role.kubernetes.io/control-plane: ""
  tolerations:
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"

workers:
  podManagementPolicy: Parallel
  replicas: 4
  nodeSelector:
    role: worker
  resources:
    requests:
      cpu: 1
      memory: 1Gi
    limits:
      cpu: 2
      memory: 2Gi
  persistence:
    enabled: true
    size: 5Gi
    storageClassName: "nfs-airflow"
  env:
    - name: TZ
      value: "Asia/Taipei"
  securityContexts:
    container:
      capabilities:
        add:
          - NET_RAW

flower:
  enabled: true
  nodeSelector:
    node-role.kubernetes.io/control-plane: ""
  tolerations:
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Exists"
      effect: "NoSchedule"
  service:
    type: NodePort

dagProcessor:
  nodeSelector:
    role: worker

triggerer:
  nodeSelector:
    role: worker
  persistence:
    enabled: false

config:
  core:
    max_map_length: 100000
  webserver:
    base_url: "http://10.10.0.83:8080"
    enable_proxy_fix: "True"
    cookie_secure: 'False'
    cookie_samesite: 'Lax'
    session_backend: 'database'

  celery:
    worker_concurrency: 4
    task_acks_late: "True"
    worker_prefetch_multiplier: 1

4.3 部署

使用上述 values.yml 進行部署。

helm upgrade --install airflow apache-airflow/airflow \
  --namespace airflow \
  --version 1.18.0 \
  -f values_celery.yml \
  --set images.airflow.repository=10.10.0.85:50000/airflow-custom \
  --set images.airflow.tag=1.0 \
  --set images.airflow.pullPolicy=Always \
  --debug

注意: --version 請根據 Airflow 版本對應表選擇合適的 Chart 版本。


5. 驗證部署

5.1 檢查 Pod 狀態

kubectl get pods -n airflow -o wide -w

確認所有 Pod (Webserver, Scheduler, Worker, Redis/RabbitMQ) 都處於 Running 狀態。

5.2 存取 Web UI

Airflow Webserver 使用瀏覽器直接存取:

  • URL: http://10.10.0.83:8080
  • 帳號/密碼: 預設為 admin / admin

5.3 驗證 Airflow 運作

  1. 登入 Web UI。
  2. 確認首頁正常顯示,且無錯誤訊息。
  3. 確認 Cluster Activity 或 DAGs 列表正常載入。