# Airflow on k8s HA Installation Guide 本文件說明如何在 k8s cluster 中部署高可用的 Airflow。 ## 0. 前置需求 - **Kubernetes**: 已部署且運作正常。 - **PostgreSQL**: 已部署且運作正常。 - **RabbitMQ**: 已部署且運作正常。 --- ## 1. 基礎配置 ### 1.0 準備 NFS 儲存目錄 在部署 Airflow 之前,必須先在 NFS Server (10.10.0.85) 上建立所需的目錄結構並設定正確權限。 **建立 Airflow 目錄結構:** ```bash # 建立 dags 和 logs 目錄 sudo mkdir -p /srv/nfs/airflow/{dags,logs} # 設定權限 (50000:0 是 Airflow 容器內的預設 UID:GID) sudo chown -R 50000:0 /srv/nfs/airflow sudo chmod -R 775 /srv/nfs/airflow # 驗證目錄權限 ls -ld /srv/nfs/airflow/{dags,logs} # 預期輸出: # drwxrwxr-x 2 50000 root 4096 Feb 3 19:30 /srv/nfs/airflow/dags # drwxrwxr-x 2 50000 root 4096 Feb 3 19:30 /srv/nfs/airflow/logs ``` **重新載入 NFS 匯出配置:** ```bash # 確認 /etc/exports 包含以下配置: # /srv/nfs/airflow 10.10.0.0/16(rw,sync,no_subtree_check,no_root_squash) # 重新載入 NFS 匯出 sudo exportfs -ra # 驗證匯出狀態 sudo exportfs -v | grep airflow # 預期輸出: # /srv/nfs/airflow 10.10.0.0/16(rw,wdelay,no_root_squash,no_subtree_check,...) ``` > **重要提醒**: > - NFS 匯出配置 (`/etc/exports`) 只定義掛載權限,不會自動建立目錄 > - 必須手動建立目錄並設定正確的 UID/GID (50000:0) > - 確保目錄權限為 775,讓 Airflow 容器可以寫入 --- ### 1.1 Storage Class 設定 **1. 安裝 NFS CSI Driver:** ```bash helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts helm repo update helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs -n kube-system ``` **2. 建立 StorageClass:** ```bash sudo vi nfs-airflow-storage-class.yml ``` ```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: nfs-airflow provisioner: nfs.csi.k8s.io parameters: server: 10.10.0.85 share: /srv/nfs/airflow reclaimPolicy: Retain volumeBindingMode: Immediate ``` 執行套用: ```bash kubectl apply -f nfs-airflow-storage-class.yml ``` ### 1.2 建立 Airflow 固定的 PV/PVC 建立 `airflow-dags-storage.yml`: ```bash sudo vi airflow-dags-storage.yml ``` ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: airflow-dags-pv spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain nfs: path: /srv/nfs/airflow/dags server: 10.10.0.85 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: airflow-dags-pvc namespace: airflow spec: storageClassName: nfs-airflow accessModes: - ReadWriteMany resources: requests: storage: 10Gi ``` 建立 `airflow-logs-storage.yml`: ```bash sudo vi airflow-logs-storage.yml ``` ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: airflow-logs-pv spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain nfs: path: /srv/nfs/airflow/logs server: 10.10.0.85 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: airflow-logs-pvc namespace: airflow spec: storageClassName: nfs-airflow accessModes: - ReadWriteMany resources: requests: storage: 10Gi ``` 執行套用: ```bash kubectl create namespace airflow kubectl apply -f nfs-dags.yml kubectl apply -f airflow-logs-storage.yml ``` 驗證 PV/PVC 狀態: ```bash kubectl get pv | grep airflow kubectl get pvc -n airflow ``` ### 1.3 PostgreSQL 設定 **1. 建立資料庫與使用者:** 連線至資料庫 ```bash psql -h 10.10.0.83 -p 5000 -U postgres ``` 執行以下 SQL 指令: ```sql -- 1. 建立使用者 airflow CREATE USER airflow WITH PASSWORD 'airflow'; -- 2. 建立資料庫 airflow_db 並指定擁有者為 airflow CREATE DATABASE airflow_db OWNER airflow; -- 3. (選用) 授予權限 GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow; GRANT ALL PRIVILEGES ON SCHEMA public TO airflow; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow; GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO airflow; -- 授予未來建立的物件權限 ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO airflow; ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO airflow; ``` **2. 驗證連線:** 確認可以使用新帳號連線: `psql -h 10.10.0.83 -p 5000 -U airflow -d airflow_db` ### 1.4 Container Registry 設定 **設定 Image Pull Secret (若需要):** 若 Registry 需要驗證,請建立 Secret 並在 `values.yml` 中參照。 ```bash kubectl create secret docker-registry airflow-registry-secret \ --docker-server=10.10.0.85:50000 \ --docker-username=admin \ --docker-password= \ --namespace airflow ``` ### 1.5 RabbitMQ 設定 **建立 Airflow 專用帳號(若未建立):** 建立 `airflow-rabbitmq-user.yaml` 並套用: ```bash sudo vi airflow-rabbitmq-user.yaml ``` ```yaml apiVersion: v1 kind: Secret metadata: name: airflow-rabbitmq-credentials namespace: airflow type: Opaque stringData: username: airflow password: airflow --- apiVersion: rabbitmq.com/v1beta1 kind: User metadata: name: airflow namespace: airflow spec: rabbitmqClusterReference: name: airflow-rabbitmq-cluster tags: - management credentials: secretName: airflow-rabbitmq-credentials --- apiVersion: rabbitmq.com/v1beta1 kind: Permission metadata: name: airflow-permission namespace: airflow spec: rabbitmqClusterReference: name: airflow-rabbitmq-cluster user: airflow vhost: / permissions: configure: ".*" write: ".*" read: ".*" ``` 執行套用: ```bash kubectl apply -f airflow-rabbitmq-user.yaml ``` --- ## 2. 環境設定 ### 2.1 建立 Namespace 為 Airflow 建立獨立的命名空間 (Namespace): ```bash kubectl create namespace airflow # 如果尚未建立 ``` ### 2.2 設定節點標籤 (Node Labels) 根據架構設計,Airflow 的控制元件 (Scheduler, Webserver) 將運行於 Control Plane 節點,而 Worker 運行於 Worker 節點。 **Control Plane 節點 (doris-f01 ~ f03):** ```bash # 確保這些節點有此標籤 kubectl label node doris-f01 node-role.kubernetes.io/control-plane="" --overwrite kubectl label node doris-f02 node-role.kubernetes.io/control-plane="" --overwrite kubectl label node doris-f03 node-role.kubernetes.io/control-plane="" --overwrite ``` **Worker 節點 (doris-b01 ~ b04):** ```bash # 確保這些節點有此標籤 kubectl label node doris-b01 role=worker --overwrite kubectl label node doris-b02 role=worker --overwrite kubectl label node doris-b03 role=worker --overwrite kubectl label node doris-b04 role=worker --overwrite ``` ### 2.3 建立 Kubernetes Secrets(若需要) 為了安全起見,手動建立包含敏感資訊的 Secret,而不是直接寫在 Helm Values 中。 **產生 Fernet Key:** ```python python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" # 輸出範例: rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs= ``` **建立 Secret:** ```bash kubectl create secret generic airflow-secrets \ --namespace airflow \ --from-literal=airflow-fernet-key='rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs=' \ --from-literal=airflow-webserver-secret='this-must-be-a-long-random-string-fixed-for-ha' \ --from-literal=metadata-connection='postgresql://airflow:airflow@10.10.0.83:5000/airflow_db?sslmode=disable' \ --from-literal=result-backend-connection='postgresql://airflow:airflow@10.10.0.83:5000/airflow_db?sslmode=disable' \ --from-literal=broker-url='amqp://airflow:airflow@rabbitmq-cluster.rabbitmq-system.svc.cluster.local:5672//' ``` --- ## 3. 建置與推送 Docker Image 由於我們需要安裝 `fping` 並給予 `NET_RAW` 權限,必須使用客製化的 Docker Image。 ### 3.1 準備 Dockerfile ```bash sudo vi Dockerfile ``` ```dockerfile FROM apache/airflow:3.0.2 USER root RUN apt-get update && apt-get install -y --no-install-recommends fping iputils-ping libcap2-bin \ && setcap cap_net_raw+ep /usr/bin/fping && setcap cap_net_raw+ep /usr/bin/ping \ && apt-get clean && rm -rf /var/lib/apt/lists/* USER airflow RUN pip install --no-cache-dir ping3==4.0.8 ``` ### 3.2 建置並推送 ```bash # 1. 建置 Image podman build -t 10.10.0.85:50000/airflow-custom:1.0 . # 2. 推送至 Registry podman push 10.10.0.85:50000/airflow-custom:1.0 ``` --- ## 4. 使用 Helm 部署 Airflow ### 4.1 加入 Airflow Helm Repo ```bash helm repo add apache-airflow https://airflow.apache.org helm repo update ``` ### 4.2 準備 Values 檔案 建立 `values.yml`,內容如下(請務必檢查資料庫與 Broker 連線資訊): ```yaml fullnameOverride: "airflow" useStandardNaming: true images: airflow: repository: 10.10.0.85:50000/airflow-custom tag: "1.0" pullPolicy: Always executor: "CeleryExecutor" postgresql: enabled: false redis: enabled: false data: metadataConnection: user: "airflow" pass: "airflow" protocol: postgresql host: "10.10.0.83" port: 5000 db: "airflow_db" sslmode: disable brokerUrl: "amqp://airflow:airflow@airflow-rabbitmq-cluster:5672/" resultBackendConnection: protocol: postgresql host: "10.10.0.83" port: 5000 db: "airflow_db" user: "airflow" pass: "airflow" sslmode: disable migrateDatabaseJob: nodeSelector: role: worker webserverSecretKey: "this-must-be-a-long-random-string-fixed-for-ha" fernetKey: "rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs=" dags: persistence: enabled: true existingClaim: airflow-dags-pvc logs: persistence: enabled: true existingClaim: airflow-logs-pvc # ✅ 保留 apiServer 配置(你的環境需要它) apiServer: replicas: 3 service: type: NodePort ports: - name: airflow-ui port: 8080 nodePort: 30080 nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - key: "node-role.kubernetes.io/control-plane" operator: "Exists" effect: "NoSchedule" scheduler: replicas: 1 nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - key: "node-role.kubernetes.io/control-plane" operator: "Exists" effect: "NoSchedule" workers: podManagementPolicy: Parallel replicas: 4 nodeSelector: role: worker resources: requests: cpu: 1 memory: 1Gi limits: cpu: 2 memory: 2Gi persistence: enabled: true size: 5Gi storageClassName: "nfs-airflow" env: - name: TZ value: "Asia/Taipei" securityContexts: container: capabilities: add: - NET_RAW flower: enabled: true nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - key: "node-role.kubernetes.io/control-plane" operator: "Exists" effect: "NoSchedule" service: type: NodePort dagProcessor: nodeSelector: role: worker triggerer: nodeSelector: role: worker persistence: enabled: false config: core: max_map_length: 100000 webserver: base_url: "http://10.10.0.83:8080" enable_proxy_fix: "True" cookie_secure: 'False' cookie_samesite: 'Lax' session_backend: 'database' celery: worker_concurrency: 4 task_acks_late: "True" worker_prefetch_multiplier: 1 ``` ### 4.3 部署 使用上述 `values.yml` 進行部署。 ```bash helm upgrade --install airflow apache-airflow/airflow \ --namespace airflow \ --version 1.18.0 \ -f values_celery.yml \ --set images.airflow.repository=10.10.0.85:50000/airflow-custom \ --set images.airflow.tag=1.0 \ --set images.airflow.pullPolicy=Always \ --debug ``` > **注意**: `--version` 請根據 Airflow 版本對應表選擇合適的 Chart 版本。 --- ## 5. 驗證部署 ### 5.1 檢查 Pod 狀態 ```bash kubectl get pods -n airflow -o wide -w ``` 確認所有 Pod (Webserver, Scheduler, Worker, Redis/RabbitMQ) 都處於 `Running` 狀態。 ### 5.2 存取 Web UI Airflow Webserver 使用瀏覽器直接存取: * **URL**: `http://10.10.0.83:8080` * **帳號/密碼**: 預設為 `admin` / `admin` ### 5.3 驗證 Airflow 運作 1. 登入 Web UI。 2. 確認首頁正常顯示,且無錯誤訊息。 3. 確認 Cluster Activity 或 DAGs 列表正常載入。