Kubernetes资源调度之节点亲和
创始人
2024-03-02 00:46:22
0

Kubernetes资源调度之节点亲和

Pod节点选择器

nodeSelector指定的标签选择器过滤符合条件的节点作为可用目标节点,最终选择则基于打分机制完成。因此,后者也称为节点选择器。用户事先为特定部分的Node资源对象设定好标签,而后即可配置Pod通过节点选择器实现类似于节点的强制亲和调度。

可以通过下面的命令查看每个node上的标签

[root@k8s-01 ~]# kubectl get nodes --show-labels
NAME     STATUS   ROLES                  AGE   VERSION   LABELS
k8s-01   Ready    control-plane,master   26d   v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8s-02   Ready                     26d   v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-02,kubernetes.io/os=linux,type=kong,ype=kong
k8s-03   Ready                     26d   v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-03,kubernetes.io/os=linux,type=kong
k8s-04   Ready                     8d    v1.22.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-04,kubernetes.io/os=linux
[root@k8s-01 ~]#

给node4节点打上标签

[root@k8s-01 ~]# kubectl label nodes k8s-04 type=test
node/k8s-04 labeled

测试的yaml

echo "
apiVersion: v1
kind: Pod
metadata:labels:app: nginx-testname: nginx-test
spec:containers:- image: nginxname: nginx-testnodeSelector:type: test
" | kubectl apply -f -

查看pod的分配情况

[root@k8s-01 ~]# kubectl get pods  -o wide |grep nginx-test
nginx-test                                1/1     Running   0               58s     10.244.7.81      k8s-04              
[root@k8s-01 ~]#

然后我们可以通过 describe 命令查看调度结果:

image-20220627165806382

事实上,多数情况下用户都无须关心Pod对象的具体运行位置,除非Pod依赖的特殊条件仅能由部分节点满足时,例如GPU和SSD等。即便如此,也应该尽量避免使用.spec.nodeName静态指定Pod对象的运行位置,而是应该让调度器基于标签和标签选择器为Pod挑选匹配的工作节点。另外,Pod规范中的.spec.nodeSelector仅支持简单等值关系的节点选择器,而.spec.affinity.nodeAffinity支持更灵活的节点选择器表达式,而且可以实现硬亲和与软亲和逻辑。

节点亲和调度

节点亲和性(nodeAffinity)主要是用来控制 Pod 要部署在哪些节点上,以及不能部署在哪些节点上的,它可以进行一些简单的逻辑组合了,不只是简单的相等匹配。在Pod上定义节点亲和规则时有两种类型的节点亲和关系:强制(required)亲和和首选(preferred)亲和,或分别称为硬亲和与软亲和。

  • 强制亲和限定了调度Pod资源时必须要满足的规则,无可用节点时Pod对象会被置为Pending状态,直到满足规则的节点出现。
  • 柔性亲和规则实现的是一种柔性调度限制,它同样倾向于将Pod运行在某类特定的节点之上,但无法满足调度需求时,调度器将选择一个无法匹配规则的节点,而非将Pod置于Pending状态。

在Pod规范上定义节点亲和规则的关键点有两个:

  • 一是给节点规划并配置合乎期望的标签;
  • 二是为Pod对象定义合理的标签选择器。

正如preferredDuringSchedulingIgnoredDuringExecution和requiredDuringSchedulingIgnoredDuringExecution字段名字中的后半段符串IgnoredDuringExecution隐含的意义所指,在Pod资源基于节点亲和规则调度至某节点之后,因节点标签发生了改变而变得不再符合Pod定义的亲和规则时,调度器也不会将Pod从此节点上移出,因而亲和调度仅在调度执行的过程中进行一次即时的判断,而非持续地监视亲和规则是否能够得以满足。

查看官方说明:

[root@k8s-01 ~]# kubectl explain pod.spec.affinity.nodeAffinity
KIND:     Pod
VERSION:  v1RESOURCE: nodeAffinity DESCRIPTION:Describes node affinity scheduling rules for the pod.Node affinity is a group of node affinity scheduling rules.FIELDS:preferredDuringSchedulingIgnoredDuringExecution      <[]Object>The scheduler will prefer to schedule pods to nodes that satisfy theaffinity expressions specified by this field, but it may choose a node thatviolates one or more of the expressions. The node that is most preferred isthe one with the greatest sum of weights, i.e. for each node that meets allof the scheduling requirements (resource request, requiredDuringSchedulingaffinity expressions, etc.), compute a sum by iterating through theelements of this field and adding "weight" to the sum if the node matchesthe corresponding matchExpressions; the node(s) with the highest sum arethe most preferred.requiredDuringSchedulingIgnoredDuringExecution       If the affinity requirements specified by this field are not met atscheduling time, the pod will not be scheduled onto the node. If theaffinity requirements specified by this field cease to be met at some pointduring pod execution (e.g. due to an update), the system may or may not tryto eventually evict the pod from its node.[root@k8s-01 ~]# 

强制亲和

查看官方说明:

[root@k8s-01 ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND:     Pod
VERSION:  v1RESOURCE: requiredDuringSchedulingIgnoredDuringExecution DESCRIPTION:If the affinity requirements specified by this field are not met atscheduling time, the pod will not be scheduled onto the node. If theaffinity requirements specified by this field cease to be met at some pointduring pod execution (e.g. due to an update), the system may or may not tryto eventually evict the pod from its node.A node selector represents the union of the results of one or more labelqueries over a set of nodes; that is, it represents the OR of the selectorsrepresented by the node selector terms.FIELDS:nodeSelectorTerms    <[]Object> -required-Required. A list of node selector terms. The terms are ORed.[root@k8s-01 ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms
KIND:     Pod
VERSION:  v1RESOURCE: nodeSelectorTerms <[]Object>DESCRIPTION:Required. A list of node selector terms. The terms are ORed.A null or empty node selector term matches no objects. The requirements ofthem are ANDed. The TopologySelectorTerm type implements a subset of theNodeSelectorTerm.FIELDS:matchExpressions     <[]Object>A list of node selector requirements by node's labels.matchFields  <[]Object>A list of node selector requirements by node's fields.[root@k8s-01 ~]# 
  • matchExpressions:标签选择器表达式,基于节点标签进行过滤;可重复使用以表达不同的匹配条件,各条件间为“或”关系。▪
  • matchFields:以字段选择器表达的节点选择器;可重复使用以表达不同的匹配条件,各条件间为“或”关系。

注意:每个匹配条件可由一到多个匹配规则组成,例如某个matchExpressions条件下可同时存在两个表达式规则,如下面的示例所示,同一条件下的各条规则彼此间为“逻辑与”关系。这意味着某节点满足nodeSelectorTerms中的任意一个条件即可,但满足某个条件指的是可完全匹配该条件下定义的所有规则。

下面开始测试,给node3打上标签

[root@k8s-01 ~]# kubectl label nodes k8s-03 disktype=ssd
node/k8s-03 labeled

测试yaml

apiVersion: v1
kind: Pod
metadata:name: "busy-affinity"labels:app: "busy-affinity"
spec:containers:- name: busy-affinityimage: "busybox"command: ["/bin/sh","-c","sleep 600"]affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution: ## 硬标准,后面是调度期间有效,忽略执行期间nodeSelectorTerms:- matchExpressions:- key: disktypevalues: ["ssd","hdd"]operator: In

查看pod

[root@k8s-01 ~]# kubectl get pods -o wide |grep busy-affinity
busy-affinity                             1/1     Running   0               76s     10.244.165.209   k8s-03              
[root@k8s-01 ~]#

从结果可以看出 Pod 被部署到了 node3 节点上,因为我们的硬策略就是部署到含有"ssd","hdd"标签的node节点上,所以会尽量满足。这里的匹配逻辑是 label 标签的值在某个列表中,现在 Kubernetes 提供的操作符有下面的几种:

  • In:label 的值在某个列表中
  • NotIn:label 的值不在某个列表中
  • Gt:label 的值大于某个值
  • Lt:label 的值小于某个值
  • Exists:某个 label 存在
  • DoesNotExist:某个 label 不存在

下面给node3和node4加上所有的标签,用于测试

[root@k8s-01 ~]#  kubectl label nodes k8s-04 disktype=hdd
node/k8s-04 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-04 disk=60
node/k8s-04 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-04 gpu=3080
node/k8s-04 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-03 gpu=3090
node/k8s-03 labeled
[root@k8s-01 ~]# kubectl label nodes k8s-03 disk=30
node/k8s-03 labeled
[root@k8s-01 ~]#

柔性亲和

节点首选亲和机制为节点选择机制提供了一种柔性控制逻辑,被调度的Pod对象不再是“必须”,而是“应该”放置到某些特定节点之上,但条件不满足时,该Pod也能够接受被编排到其他不符合条件的节点之上。另外,多个柔性亲和条件并存时,它还支持为每个条件定义weight属性以区别它们优先级,取值范围是1~100,数字越大优先级越高。

修改后的yaml,加了柔性亲和

apiVersion: v1
kind: Pod
metadata:name: "busy-affinity"labels:app: "busy-affinity"
spec:containers:- name: busy-affinityimage: "busybox"command: ["/bin/sh","-c","sleep 600"]affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution: ## 硬标准,后面是调度期间有效,忽略执行期间nodeSelectorTerms:- matchExpressions:- key: disktypevalues: ["ssd","hdd"]operator: InpreferredDuringSchedulingIgnoredDuringExecution:  ## 软性分- preference: ## 指定我们喜欢的条件 matchExpressions:- key: diskvalues: ["40"]operator: Gt   ## node3的disk不满足这个条件,node4满足weight: 70 ## 权重 0-100- preference: ## 指定我们喜欢的条件matchExpressions:- key: gpuvalues: ["3080"]operator: Gt  ## node4的gpu不满足这个条件,node3满足weight: 30 ## 权重 0-100

因为在硬性条件中,node3和node4都满足硬性条件,在查看一下软性条件,其中node3满足第二个,node4满足第一个,但是第一个的权重高一些,所以最终选择了node4,查看运行的pods

[root@k8s-01 ~]# kubectl get pods -o wide
NAME                                      READY   STATUS    RESTARTS        AGE     IP               NODE     NOMINATED NODE   READINESS GATES
busy-affinity                             1/1     Running   0               34s     10.244.7.82      k8s-04              
counter                                   1/1     Running   0               8d      10.244.165.198   k8s-03              
nfs-client-provisioner-69b76b8dc6-ms4xg   1/1     Running   1 (14d ago)     26d     10.244.179.21    k8s-02              
nginx-5759cb8dcc-t4sdn                    1/1     Running   0               7d6h    10.244.179.50    k8s-02              
nginx-liveness                            1/1     Running   2 (5d17h ago)   5d17h   10.244.61.218    k8s-01              
nginx-nginx                               1/1     Running   0               5d17h   10.244.179.3     k8s-02              
nginx-readiness                           1/1     Running   0               5d17h   10.244.61.219    k8s-01              
nginx-test                                1/1     Running   0               45m     10.244.7.81      k8s-04              
post-test                                 1/1     Running   0               5d18h   10.244.179.62    k8s-02              

相关内容

热门资讯

AWSECS:访问外部网络时出... 如果您在AWS ECS中部署了应用程序,并且该应用程序需要访问外部网络,但是无法正常访问,可能是因为...
AWSElasticBeans... 在Dockerfile中手动配置nginx反向代理。例如,在Dockerfile中添加以下代码:FR...
银河麒麟V10SP1高级服务器... 银河麒麟高级服务器操作系统简介: 银河麒麟高级服务器操作系统V10是针对企业级关键业务...
北信源内网安全管理卸载 北信源内网安全管理是一款网络安全管理软件,主要用于保护内网安全。在日常使用过程中,卸载该软件是一种常...
AWR报告解读 WORKLOAD REPOSITORY PDB report (PDB snapshots) AW...
AWS管理控制台菜单和权限 要在AWS管理控制台中创建菜单和权限,您可以使用AWS Identity and Access Ma...
​ToDesk 远程工具安装及... 目录 前言 ToDesk 优势 ToDesk 下载安装 ToDesk 功能展示 文件传输 设备链接 ...
群晖外网访问终极解决方法:IP... 写在前面的话 受够了群晖的quickconnet的小水管了,急需一个新的解决方法&#x...
不能访问光猫的的管理页面 光猫是现代家庭宽带网络的重要组成部分,它可以提供高速稳定的网络连接。但是,有时候我们会遇到不能访问光...
Azure构建流程(Power... 这可能是由于配置错误导致的问题。请检查构建流程任务中的“发布构建制品”步骤,确保正确配置了“Arti...