与 YuniKorn 集成

结合使用 Spark operator 和 YuniKorn 进行批量调度

YuniKorn 是 Kubernetes 的一种替代调度器,相比默认调度器,它提供批量调度能力,并且可以显著改善在 Kubernetes 上运行 Spark 的体验。这些能力包括 Gang 调度、应用感知的作业队列、分层资源配额和改进的 Binpacking。

如何使用

1. 安装 YuniKorn

请按照 YuniKorn 文档中的快速入门指南在您的 Kubernetes 集群上安装 YuniKorn。

注意:默认情况下,YuniKorn 会安装一个准入控制器,它会将所有 Pod 的 schedulerName 字段设置为 YuniKorn。如果不需要此行为,请查阅 YuniKorn 文档以获取禁用此功能的说明。

2. 在控制器中启用批量调度

在您的 Operator 安装中设置以下值,以在控制器中启用批量调度。您也可以选择性地为所有 SparkApplication 定义设置默认批量调度器,如果用户未在 .spec.batchScheduler 中指定的话。

controller:
  batchScheduler:
    enable: true
    # Setting the default batch scheduler is optional. The default only
    # applies if the batchScheduler field on the SparkApplication spec is not set
    default: yunikorn

3. 提交应用

如下所示,在您的 SparkApplication 中指定批量调度器。您可以在仓库的 examples/spark-pi-yunikorn.yaml 下找到完整的示例。

spec:
  ...
  batchScheduler: yunikorn
  batchSchedulerOptions:
    queue: root.default

使用上述示例,Spark operator 将执行以下操作:

  1. 使用任务组注解标记驱动程序 pod
  2. 将驱动程序和执行程序 pod 上的 schedulerName 字段设置为 yunikorn
  3. 如果在 batchSchedulerOptions 下指定了队列标签,则将其添加到驱动程序和执行程序 pod

有关 Gang 调度、任务组和队列路由的更多信息,请查阅以下 YuniKorn 文档页面:

您应该看到以下 Pod 事件,表明 Pod 已使用 YuniKorn 进行 Gang 调度

Type    Reason             Age   From      Message
----    ------             ----  ----      -------
Normal  Scheduling         20s   yunikorn  default/spark-pi-yunikorn-driver is queued and waiting for allocation
Normal  GangScheduling     20s   yunikorn  Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
Normal  Scheduled          19s   yunikorn  Successfully assigned default/spark-pi-yunikorn-driver to node spark-operator-worker
Normal  PodBindSuccessful  19s   yunikorn  Pod default/spark-pi-yunikorn-driver is successfully bound to node spark-operator-worker
Normal  TaskCompleted      4s    yunikorn  Task default/spark-pi-yunikorn-driver is completed
Normal  Pulling            20s   kubelet   Pulling image "spark:3.5.2"
Normal  Pulled             13s   kubelet   Successfully pulled image "spark:3.5.2" in 6.162s (6.162s including waiting)
Normal  Created            13s   kubelet   Created container spark-kubernetes-driver
Normal  Started            13s   kubelet   Started container spark-kubernetes-driver

驱动程序 Pod 上也应该存在以下注解和标签

apiVersion: v1
kind: Pod
metadata:
  annotations:
    yunikorn.apache.org/allow-preemption: "true"
    yunikorn.apache.org/task-group-name: spark-driver
    yunikorn.apache.org/task-groups: '[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"896Mi"},"labels":{"queue":"root.default","version":"3.5.2"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"1","memory":"896Mi"},"labels":{"queue":"root.default","version":"3.5.2"}}]'
    yunikorn.apache.org/user.info: '{"user":"system:serviceaccount:spark-operator:spark-operator-controller","groups":["system:serviceaccounts","system:serviceaccounts:spark-operator","system:authenticated"]}'
  creationTimestamp: "2024-09-10T04:40:37Z"
  labels:
    queue: root.default
    spark-app-name: spark-pi-yunikorn
    spark-app-selector: spark-1bfe85bb77df4d5594337249b38c9648
    spark-role: driver
    spark-version: 3.5.2
    sparkoperator.k8s.io/app-name: spark-pi-yunikorn
    sparkoperator.k8s.io/launched-by-spark-operator: "true"
    sparkoperator.k8s.io/submission-id: 1a71de55-cdc7-4e62-b997-197883dc4cbe
    version: 3.5.2
  name: spark-pi-yunikorn-driver
  ...

反馈

此页面有帮助吗?