本文件說明如何在 k8s cluster 中部署高可用的 Airflow。
在部署 Airflow 之前,必須先在 NFS Server (10.10.0.85) 上建立所需的目錄結構並設定正確權限。
建立 Airflow 目錄結構:
# 建立 dags 和 logs 目錄
sudo mkdir -p /srv/nfs/airflow/{dags,logs}
# 設定權限 (50000:0 是 Airflow 容器內的預設 UID:GID)
sudo chown -R 50000:0 /srv/nfs/airflow
sudo chmod -R 775 /srv/nfs/airflow
# 驗證目錄權限
ls -ld /srv/nfs/airflow/{dags,logs}
# 預期輸出:
# drwxrwxr-x 2 50000 root 4096 Feb 3 19:30 /srv/nfs/airflow/dags
# drwxrwxr-x 2 50000 root 4096 Feb 3 19:30 /srv/nfs/airflow/logs
重新載入 NFS 匯出配置:
# 確認 /etc/exports 包含以下配置:
# /srv/nfs/airflow 10.10.0.0/16(rw,sync,no_subtree_check,no_root_squash)
# 重新載入 NFS 匯出
sudo exportfs -ra
# 驗證匯出狀態
sudo exportfs -v | grep airflow
# 預期輸出:
# /srv/nfs/airflow 10.10.0.0/16(rw,wdelay,no_root_squash,no_subtree_check,...)
重要提醒:
- NFS 匯出配置 (
/etc/exports) 只定義掛載權限,不會自動建立目錄- 必須手動建立目錄並設定正確的 UID/GID (50000:0)
- 確保目錄權限為 775,讓 Airflow 容器可以寫入
1. 安裝 NFS CSI Driver:
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update
helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs -n kube-system
2. 建立 StorageClass:
sudo vi nfs-airflow-storage-class.yml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-airflow
provisioner: nfs.csi.k8s.io
parameters:
server: 10.10.0.85
share: /srv/nfs/airflow
reclaimPolicy: Retain
volumeBindingMode: Immediate
執行套用:
kubectl apply -f nfs-airflow-storage-class.yml
建立 airflow-dags-storage.yml:
sudo vi airflow-dags-storage.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: airflow-dags-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
path: /srv/nfs/airflow/dags
server: 10.10.0.85
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: airflow-dags-pvc
namespace: airflow
spec:
storageClassName: nfs-airflow
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
建立 airflow-logs-storage.yml:
sudo vi airflow-logs-storage.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: airflow-logs-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
path: /srv/nfs/airflow/logs
server: 10.10.0.85
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: airflow-logs-pvc
namespace: airflow
spec:
storageClassName: nfs-airflow
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
執行套用:
kubectl create namespace airflow
kubectl apply -f nfs-dags.yml
kubectl apply -f airflow-logs-storage.yml
驗證 PV/PVC 狀態:
kubectl get pv | grep airflow
kubectl get pvc -n airflow
1. 建立資料庫與使用者:
連線至資料庫
psql -h 10.10.0.83 -p 5000 -U postgres
執行以下 SQL 指令:
-- 1. 建立使用者 airflow
CREATE USER airflow WITH PASSWORD 'airflow';
-- 2. 建立資料庫 airflow_db 並指定擁有者為 airflow
CREATE DATABASE airflow_db OWNER airflow;
-- 3. (選用) 授予權限
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow;
GRANT ALL PRIVILEGES ON SCHEMA public TO airflow;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO airflow;
-- 授予未來建立的物件權限
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO airflow;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO airflow;
2. 驗證連線:
確認可以使用新帳號連線:
psql -h 10.10.0.83 -p 5000 -U airflow -d airflow_db
設定 Image Pull Secret (若需要):
若 Registry 需要驗證,請建立 Secret 並在 values.yml 中參照。
kubectl create secret docker-registry airflow-registry-secret \
--docker-server=10.10.0.85:50000 \
--docker-username=admin \
--docker-password=<your-password> \
--namespace airflow
建立 Airflow 專用帳號(若未建立):
建立 airflow-rabbitmq-user.yaml 並套用:
sudo vi airflow-rabbitmq-user.yaml
apiVersion: v1
kind: Secret
metadata:
name: airflow-rabbitmq-credentials
namespace: airflow
type: Opaque
stringData:
username: airflow
password: airflow
---
apiVersion: rabbitmq.com/v1beta1
kind: User
metadata:
name: airflow
namespace: airflow
spec:
rabbitmqClusterReference:
name: airflow-rabbitmq-cluster
tags:
- management
credentials:
secretName: airflow-rabbitmq-credentials
---
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
name: airflow-permission
namespace: airflow
spec:
rabbitmqClusterReference:
name: airflow-rabbitmq-cluster
user: airflow
vhost: /
permissions:
configure: ".*"
write: ".*"
read: ".*"
執行套用:
kubectl apply -f airflow-rabbitmq-user.yaml
為 Airflow 建立獨立的命名空間 (Namespace):
kubectl create namespace airflow # 如果尚未建立
根據架構設計,Airflow 的控制元件 (Scheduler, Webserver) 將運行於 Control Plane 節點,而 Worker 運行於 Worker 節點。
Control Plane 節點 (doris-f01 ~ f03):
# 確保這些節點有此標籤
kubectl label node doris-f01 node-role.kubernetes.io/control-plane="" --overwrite
kubectl label node doris-f02 node-role.kubernetes.io/control-plane="" --overwrite
kubectl label node doris-f03 node-role.kubernetes.io/control-plane="" --overwrite
Worker 節點 (doris-b01 ~ b04):
# 確保這些節點有此標籤
kubectl label node doris-b01 role=worker --overwrite
kubectl label node doris-b02 role=worker --overwrite
kubectl label node doris-b03 role=worker --overwrite
kubectl label node doris-b04 role=worker --overwrite
為了安全起見,手動建立包含敏感資訊的 Secret,而不是直接寫在 Helm Values 中。
產生 Fernet Key:
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# 輸出範例: rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs=
建立 Secret:
kubectl create secret generic airflow-secrets \
--namespace airflow \
--from-literal=airflow-fernet-key='rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs=' \
--from-literal=airflow-webserver-secret='this-must-be-a-long-random-string-fixed-for-ha' \
--from-literal=metadata-connection='postgresql://airflow:airflow@10.10.0.83:5000/airflow_db?sslmode=disable' \
--from-literal=result-backend-connection='postgresql://airflow:airflow@10.10.0.83:5000/airflow_db?sslmode=disable' \
--from-literal=broker-url='amqp://airflow:airflow@rabbitmq-cluster.rabbitmq-system.svc.cluster.local:5672//'
由於我們需要安裝 fping 並給予 NET_RAW 權限,必須使用客製化的 Docker Image。
sudo vi Dockerfile
FROM apache/airflow:3.0.2
USER root
RUN apt-get update && apt-get install -y --no-install-recommends fping iputils-ping libcap2-bin \
&& setcap cap_net_raw+ep /usr/bin/fping && setcap cap_net_raw+ep /usr/bin/ping \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install --no-cache-dir ping3==4.0.8
# 1. 建置 Image
podman build -t 10.10.0.85:50000/airflow-custom:1.0 .
# 2. 推送至 Registry
podman push 10.10.0.85:50000/airflow-custom:1.0
helm repo add apache-airflow https://airflow.apache.org
helm repo update
建立 values.yml,內容如下(請務必檢查資料庫與 Broker 連線資訊):
fullnameOverride: "airflow"
useStandardNaming: true
images:
airflow:
repository: 10.10.0.85:50000/airflow-custom
tag: "1.0"
pullPolicy: Always
executor: "CeleryExecutor"
postgresql:
enabled: false
redis:
enabled: false
data:
metadataConnection:
user: "airflow"
pass: "airflow"
protocol: postgresql
host: "10.10.0.83"
port: 5000
db: "airflow_db"
sslmode: disable
brokerUrl: "amqp://airflow:airflow@airflow-rabbitmq-cluster:5672/"
resultBackendConnection:
protocol: postgresql
host: "10.10.0.83"
port: 5000
db: "airflow_db"
user: "airflow"
pass: "airflow"
sslmode: disable
migrateDatabaseJob:
nodeSelector:
role: worker
webserverSecretKey: "this-must-be-a-long-random-string-fixed-for-ha"
fernetKey: "rv638BORYwOheHEXB6JoROvDgR3r9vdrOHnYcQfl0gs="
dags:
persistence:
enabled: true
existingClaim: airflow-dags-pvc
logs:
persistence:
enabled: true
existingClaim: airflow-logs-pvc
# ✅ 保留 apiServer 配置(你的環境需要它)
apiServer:
replicas: 3
service:
type: NodePort
ports:
- name: airflow-ui
port: 8080
nodePort: 30080
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
scheduler:
replicas: 1
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
workers:
podManagementPolicy: Parallel
replicas: 4
nodeSelector:
role: worker
resources:
requests:
cpu: 1
memory: 1Gi
limits:
cpu: 2
memory: 2Gi
persistence:
enabled: true
size: 5Gi
storageClassName: "nfs-airflow"
env:
- name: TZ
value: "Asia/Taipei"
securityContexts:
container:
capabilities:
add:
- NET_RAW
flower:
enabled: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
service:
type: NodePort
dagProcessor:
nodeSelector:
role: worker
triggerer:
nodeSelector:
role: worker
persistence:
enabled: false
config:
core:
max_map_length: 100000
webserver:
base_url: "http://10.10.0.83:8080"
enable_proxy_fix: "True"
cookie_secure: 'False'
cookie_samesite: 'Lax'
session_backend: 'database'
celery:
worker_concurrency: 4
task_acks_late: "True"
worker_prefetch_multiplier: 1
使用上述 values.yml 進行部署。
helm upgrade --install airflow apache-airflow/airflow \
--namespace airflow \
--version 1.18.0 \
-f values_celery.yml \
--set images.airflow.repository=10.10.0.85:50000/airflow-custom \
--set images.airflow.tag=1.0 \
--set images.airflow.pullPolicy=Always \
--debug
注意:
--version請根據 Airflow 版本對應表選擇合適的 Chart 版本。
kubectl get pods -n airflow -o wide -w
確認所有 Pod (Webserver, Scheduler, Worker, Redis/RabbitMQ) 都處於 Running 狀態。
Airflow Webserver 使用瀏覽器直接存取:
http://10.10.0.83:8080admin / admin