与 Volcano 集成以进行批处理调度

Volcano 是一个构建在 Kubernetes 之上的批处理系统。它提供了一套目前 Kubernetes 缺失的机制，而这些机制是许多批处理和弹性工作负载类型通常需要的。通过与 Volcano 集成，可以更好地调度 Spark 应用程序的 Pod，提高调度效率。

Volcano 组件

在使用启用 Volcano 的 Apache Spark Kubernetes Operator 之前，用户需要确保 Volcano 已成功安装在同一环境中，请参考 Volcano 快速入门指南进行安装。

安装启用 Volcano 的 Apache Spark Kubernetes Operator

借助 Helm Chart，可以轻松安装启用 Volcano 的 Apache Spark Kubernetes Operator，命令如下：

helm repo add spark-operator https://kubeflow.github.io/spark-operator

helm install my-release spark-operator/spark-operator \
    --namespace spark-operator \
    --set webhook.enable=true \
    --set batchScheduler.enable=true

使用 Volcano 调度器运行 Spark Application

现在，我们可以运行一个更新版本的 Spark Application（配置了 batchScheduler），例如：

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: spark:3.5.1
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-v3.5.1.jar
  sparkVersion: 3.5.1
  batchScheduler: volcano # Note: the batch scheduler name must be specified with `volcano`
  restartPolicy:
    type: Never
  volumes:
    - name: test-volume
      hostPath:
        path: /tmp
        type: Directory
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    labels:
      version: 3.5.1
    serviceAccount: spark
    volumeMounts:
      - name: test-volume
        mountPath: /tmp
  executor:
    cores: 1
    instances: 1
    memory: 512m
    labels:
      version: 3.5.1
    volumeMounts:
      - name: test-volume
        mountPath: "/tmp"

运行时，可以使用 Pod 的事件来验证 Pod 是否已通过 Volcano 进行调度。

Type    Reason     Age   From                          Message
----    ------     ----  ----                          -------
Normal  Scheduled  23s   volcano                       Successfully assigned default/spark-pi-driver to integration-worker2

技术细节

如果将 SparkApplication 配置为使用 Volcano 运行，则有一些底层细节使得两个系统集成在一起

Apache Spark Kubernetes Operator 的 webhook 会根据 SparkApplication Spec 中的 batchScheduler 来修补 Pod 的 schedulerName。
在提交 Spark Application 之前，Apache Spark Kubernetes Operator 会为整个应用程序创建一个 Volcano 原生资源 PodGroup在此。简单来说，Volcano 的大多数高级调度功能，例如 Pod 延迟创建、资源公平性和 Gang 调度都依赖于这个资源。此外，还会添加一个新的 Pod annotation，名为 scheduling.k8s.io/group-name。
Volcano 调度器将接管所有已正确配置 schedulerName 和 annotation 的 Pod 进行调度。

Apache Spark Kubernetes Operator 允许最终用户通过 BatchSchedulerOptions 属性对批处理调度进行细粒度控制。BatchSchedulerOptions 是一个字符串字典，不同的批处理调度器可以利用它来暴露不同的属性。目前，Volcano 支持以下属性：

名称	描述	示例
queue	用于指定此 Spark Application 属于哪个 Volcano 队列	batchSchedulerOptions queue: “queue1”
priorityClassName	用于指定此 Spark Application 将使用哪个 priorityClass	batchSchedulerOptions priorityClassName: “pri1”

反馈

此页是否有帮助？

感谢您的反馈！

对于此页面未能提供帮助深表歉意。如果您方便，请分享您的反馈，以便我们改进。

最后修改于 2024年6月22日: 添加 Spark Operator 文档 (#3767) (b622672)