与 YuniKorn 集成
结合使用 Spark operator 和 YuniKorn 进行批量调度
警告
此功能仅在 Operator 2.0.0 版本及更高版本中支持。YuniKorn 是 Kubernetes 的一种替代调度器,相比默认调度器,它提供批量调度能力,并且可以显著改善在 Kubernetes 上运行 Spark 的体验。这些能力包括 Gang 调度、应用感知的作业队列、分层资源配额和改进的 Binpacking。
如何使用
1. 安装 YuniKorn
请按照 YuniKorn 文档中的快速入门指南在您的 Kubernetes 集群上安装 YuniKorn。
注意:默认情况下,YuniKorn 会安装一个准入控制器,它会将所有 Pod 的 schedulerName
字段设置为 YuniKorn。如果不需要此行为,请查阅 YuniKorn 文档以获取禁用此功能的说明。
2. 在控制器中启用批量调度
在您的 Operator 安装中设置以下值,以在控制器中启用批量调度。您也可以选择性地为所有 SparkApplication
定义设置默认批量调度器,如果用户未在 .spec.batchScheduler
中指定的话。
controller:
batchScheduler:
enable: true
# Setting the default batch scheduler is optional. The default only
# applies if the batchScheduler field on the SparkApplication spec is not set
default: yunikorn
3. 提交应用
如下所示,在您的 SparkApplication
中指定批量调度器。您可以在仓库的 examples/spark-pi-yunikorn.yaml
下找到完整的示例。
spec:
...
batchScheduler: yunikorn
batchSchedulerOptions:
queue: root.default
使用上述示例,Spark operator 将执行以下操作:
- 使用任务组注解标记驱动程序 pod
- 将驱动程序和执行程序 pod 上的
schedulerName
字段设置为yunikorn
- 如果在
batchSchedulerOptions
下指定了队列标签,则将其添加到驱动程序和执行程序 pod
有关 Gang 调度、任务组和队列路由的更多信息,请查阅以下 YuniKorn 文档页面:
您应该看到以下 Pod 事件,表明 Pod 已使用 YuniKorn 进行 Gang 调度
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduling 20s yunikorn default/spark-pi-yunikorn-driver is queued and waiting for allocation
Normal GangScheduling 20s yunikorn Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
Normal Scheduled 19s yunikorn Successfully assigned default/spark-pi-yunikorn-driver to node spark-operator-worker
Normal PodBindSuccessful 19s yunikorn Pod default/spark-pi-yunikorn-driver is successfully bound to node spark-operator-worker
Normal TaskCompleted 4s yunikorn Task default/spark-pi-yunikorn-driver is completed
Normal Pulling 20s kubelet Pulling image "spark:3.5.2"
Normal Pulled 13s kubelet Successfully pulled image "spark:3.5.2" in 6.162s (6.162s including waiting)
Normal Created 13s kubelet Created container spark-kubernetes-driver
Normal Started 13s kubelet Started container spark-kubernetes-driver
驱动程序 Pod 上也应该存在以下注解和标签
apiVersion: v1
kind: Pod
metadata:
annotations:
yunikorn.apache.org/allow-preemption: "true"
yunikorn.apache.org/task-group-name: spark-driver
yunikorn.apache.org/task-groups: '[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"896Mi"},"labels":{"queue":"root.default","version":"3.5.2"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"1","memory":"896Mi"},"labels":{"queue":"root.default","version":"3.5.2"}}]'
yunikorn.apache.org/user.info: '{"user":"system:serviceaccount:spark-operator:spark-operator-controller","groups":["system:serviceaccounts","system:serviceaccounts:spark-operator","system:authenticated"]}'
creationTimestamp: "2024-09-10T04:40:37Z"
labels:
queue: root.default
spark-app-name: spark-pi-yunikorn
spark-app-selector: spark-1bfe85bb77df4d5594337249b38c9648
spark-role: driver
spark-version: 3.5.2
sparkoperator.k8s.io/app-name: spark-pi-yunikorn
sparkoperator.k8s.io/launched-by-spark-operator: "true"
sparkoperator.k8s.io/submission-id: 1a71de55-cdc7-4e62-b997-197883dc4cbe
version: 3.5.2
name: spark-pi-yunikorn-driver
...
最后修改于 2024 年 9 月 23 日:为 Spark operator/Yunikorn 集成添加页面 (#3872) (69b339c)