在使用ipvs来实现vip的负载均衡的时候,有时我们会在linux中创建一块dummy网卡,并在网卡上绑上vip
kube-proxy在ipvs模式下生成了一块kube-ipvs0虚拟网卡,并且在上面绑定了service ip
[root@master140 ~]# ip addr
5: kube-ipvs0: mtu 1500 qdisc noop state DOWN group default link/ether d2:b0:08:01:3e:52 brd ff:ff:ff:ff:ff:ffinet 172.18.13.31/32 brd 172.18.13.31 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 172.18.13.1/32 brd 172.18.13.1 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 172.18.13.187/32 brd 172.18.13.187 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 172.18.13.113/32 brd 172.18.13.113 scope global kube-ipvs0valid_lft forever preferred_lft foreverinet 172.18.13.222/32 brd 172.18.13.222 scope global kube-ipvs0valid_lft forever preferred_lft forever
还会为vip生成route规则
[root@master140 ~]# ip route show table local
local 172.18.13.1 dev kube-ipvs0 proto kernel scope host src 172.18.13.1
local 172.18.13.31 dev kube-ipvs0 proto kernel scope host src 172.18.13.31
local 172.18.13.113 dev kube-ipvs0 proto kernel scope host src 172.18.13.113
local 172.18.13.187 dev kube-ipvs0 proto kernel scope host src 172.18.13.187
local 172.18.13.222 dev kube-ipvs0 proto kernel scope host src 172.18.13.222
同时还会生成ipvs规则
[root@master140 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 127.0.0.1:30001 rr
TCP 127.0.0.1:30002 rr-> 10.0.3.5:8080 Masq 1 0 0 -> 10.0.3.7:8080 Masq 1 0 0
TCP 127.0.0.1:30094 rr-> 10.0.3.2:80 Masq 1 0 0
TCP 172.17.0.1:30001 rr
TCP 172.17.0.1:30002 rr-> 10.0.3.5:8080 Masq 1 0 0 -> 10.0.3.7:8080 Masq 1 0 0
TCP 172.17.0.1:30094 rr-> 10.0.3.2:80 Masq 1 0 0
TCP 172.18.13.1:443 rr-> 192.168.204.142:6443 Masq 1 0 0
TCP 172.18.13.31:80 rr
TCP 172.18.13.113:8082 rr-> 10.0.3.5:8080 Masq 1 0 0 -> 10.0.3.7:8080 Masq 1 0 0
TCP 172.18.13.187:3306 rr-> 10.0.3.4:3306 Masq 1 0 0
TCP 172.18.13.222:80 rr-> 10.0.3.2:80 Masq 1 0 0
TCP 192.168.204.142:30001 rr
TCP 192.168.204.142:30002 rr-> 10.0.3.5:8080 Masq 1 0 0 -> 10.0.3.7:8080 Masq 1 0 0
TCP 192.168.204.142:30094 rr-> 10.0.3.2:80 Masq 1 0 0
TCP 10.0.1.0:30001 rr
TCP 10.0.1.0:30002 rr-> 10.0.3.5:8080 Masq 1 0 0 -> 10.0.3.7:8080 Masq 1 0 0
TCP 10.0.1.0:30094 rr-> 10.0.3.2:80 Masq 1 0 0
先看下ipvs转发的流转图:
过程:
1、当用户请求到达Director Server,此时请求的数据报文会先到内核空间的PREROUTING链。 此时报文的源IP为CIP,目标IP为VIP。
2、PREROUTING检查发现数据包的目标IP是本机,将数据包送至INPUT链。
3、ipvs会监听到达input链的数据包,比对数据包请求的服务是否为集群服务,若是,修改数据包的目标IP地址为后端服务器IP,然后将数据包发至POSTROUTING链。 此时报文的源IP为CIP,目标IP为RIP。
4、POSTROUTING链通过选路,将数据包发送给Real Server
5、Real Server比对发现目标为自己的IP,开始构建响应报文发回给Director Server。 此时报文的源IP为RIP,目标IP为CIP。
6、Director Server在响应客户端前,此时会将源IP地址修改为自己的VIP地址,然后响应给客户端。 此时报文的源IP为VIP,目标IP为CIP。
新增网卡和route的作用:
由于 IPVS 的 DNAT 钩子挂在 INPUT 链上,因此必须要让内核识别 VIP 是本机的 IP。这样才会过INPUT 链,要不然就通过OUTPUT链出去了。k8s 通过设置将service cluster ip 绑定到虚拟网卡kube-ipvs0。
以cni为flannel为例,此时流转图变成这样,有了dummy网卡后,service ip的流量就能进入input链,从而执行ipvs的hook,进行dnat
ipvs的hook:
具体hook定义:
static const struct nf_hook_ops ip_vs_ops[] = {/* After packet filtering, change source only for VS/NAT */{.hook = ip_vs_reply4,.pf = NFPROTO_IPV4,.hooknum = NF_INET_LOCAL_IN,.priority = NF_IP_PRI_NAT_SRC - 2,},/* After packet filtering, forward packet through VS/DR, VS/TUN,* or VS/NAT(change destination), so that filtering rules can be* applied to IPVS. */{.hook = ip_vs_remote_request4,.pf = NFPROTO_IPV4,.hooknum = NF_INET_LOCAL_IN,.priority = NF_IP_PRI_NAT_SRC - 1,},/* Before ip_vs_in, change source only for VS/NAT */{.hook = ip_vs_local_reply4,.pf = NFPROTO_IPV4,.hooknum = NF_INET_LOCAL_OUT,.priority = NF_IP_PRI_NAT_DST + 1,},/* After mangle, schedule and forward local requests */{.hook = ip_vs_local_request4,.pf = NFPROTO_IPV4,.hooknum = NF_INET_LOCAL_OUT,.priority = NF_IP_PRI_NAT_DST + 2,},/* After packet filtering (but before ip_vs_out_icmp), catch icmp* destined for 0.0.0.0/0, which is for incoming IPVS connections */{.hook = ip_vs_forward_icmp,.pf = NFPROTO_IPV4,.hooknum = NF_INET_FORWARD,.priority = 99,},/* After packet filtering, change source only for VS/NAT */{.hook = ip_vs_reply4,.pf = NFPROTO_IPV4,.hooknum = NF_INET_FORWARD,.priority = 100,},
...
下一篇:Oracle数据库系统安全加固