0%

Openstack学习 —— 跨主机虚拟机访问2

上篇文章梳理了分别在两个node上创建了VM后,底层的Linux系统上的namespace、linux bridge以及ovs中发生的事情。

本文将来着重关注两个node上的VM相互访问的流量通路。特别是令人头疼的ovs流表以及两个node是如何通过VXLAN网络将两台虚拟机连在了一起的。

create_vm_with_2node

br-tun

在前面收集信息时,你或许已经关注到了linux bridge和ovs中还有其他变化,那么,来看看 br-tun

  • ovs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
(node0)$ ovs-vsctl show
...
Bridge br-tun
...
Port "vxlan-ac0a000b"
Interface "vxlan-ac0a000b"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.10.0.10", out_key=flow, remote_ip="172.10.0.11"}
...

(node1)$ ovs-vsctl show
...
Bridge br-tun
...
Port "vxlan-ac0a000a"
Interface "vxlan-ac0a000a"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.10.0.11", out_key=flow, remote_ip="172.10.0.10"}
...

当环境搭建完毕后,并不会立即创建这个接口,而是当创建了虚拟机后,会创建VXLAN的VTEP接口。

create_vm_with_2node_vxlan

流表

node0

  • br-int
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ ovs-ofctl dump-flows br-int
table=0, ..., priority=65535,vlan_tci=0x0fff/0x1fff actions=drop
table=0, ..., priority=10,icmp6,in_port="qvo02407769-97",icmp_type=136 actions=resubmit(,24)
table=0, ..., priority=10,arp,in_port="qvo02407769-97" actions=resubmit(,24)
table=0, ..., priority=2,in_port="int-br-provider" actions=drop
table=0, ..., priority=2,in_port="int-br-ext" actions=drop
table=0, ..., priority=9,in_port="qvo02407769-97" actions=resubmit(,25)
table=0, ..., priority=0 actions=resubmit(,60)
table=23, ..., priority=0 actions=drop
table=24, ..., priority=2,icmp6,in_port="qvo02407769-97",icmp_type=136,nd_target=fe80::f816:3eff:fead:669e actions=resubmit(,60)
table=24, ..., priority=2,arp,in_port="qvo02407769-97",arp_spa=200.0.0.219 actions=resubmit(,25)
table=24, ..., priority=0 actions=drop
table=25, ..., priority=2,in_port="qvo02407769-97",dl_src=fa:16:3e:ad:66:9e actions=resubmit(,60)
table=60, ..., priority=3 actions=NORMAL

table 0

  1. qvo02407769-97送入的icmp,arp报文,送往table 24
  2. qvo02407769-97送入其他报文送往table 25
  3. 其他报文送table 60

table 24

  1. qvo02407769-97送入的报文,送往table 60。fe80::f816:3eff:fead:669e是vm-1的IPv6地址
  2. qvo02407769-97送入的arp报文,arp源地址为200.0.0.219即vm-1来的报文,送往table 25

table 25

qvo02407769-97送入的报文,源mac为fa:16:3e:ad:66:9e即vm-1来的报文,送往table 60

table 60

正常转发

综上,从vm-1来的报文以及其他报文,都将正常在br-int上转发

  • br-tun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ ovs-ofctl dump-flows br-tun
table=0, ..., priority=1,in_port="patch-int" actions=resubmit(,2)
table=0, ..., priority=1,in_port="vxlan-ac0a000b" actions=resubmit(,4)
table=0, ..., priority=0 actions=drop
table=2, ..., priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
table=2, ..., priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
table=3, ..., priority=0 actions=drop
table=4, ..., priority=1,tun_id=0xf actions=mod_vlan_vid:1,resubmit(,10)
table=4, ..., priority=0 actions=drop
table=6, ..., priority=0 actions=drop
table=10, ..., priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0x9c915752f14d2544,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:"patch-int"
table=20, ..., priority=2,dl_vlan=1,dl_dst=fa:16:3e:f5:ca:f5 actions=strip_vlan,load:0xf->NXM_NX_TUN_ID[],output:"vxlan-ac0a000b"
table=20, ..., hard_timeout=300, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:f5:ca:f5 actions=load:0->NXM_OF_VLAN_TCI[],load:0xf->NXM_NX_TUN_ID[],output:"vxlan-ac0a000b"
table=20, ..., priority=0 actions=resubmit(,22)
table=22, ..., priority=1,dl_vlan=1 actions=strip_vlan,load:0xf->NXM_NX_TUN_ID[],output:"vxlan-ac0a000b"
table=22, ..., priority=0 actions=drop

table 0

  1. br-int来的报文(即需要从本节点送出的报文),送到table 2
  2. vxlan-ac0a000b送来的报文(即从其他节点送到node 0的),送到table 4

table 2: 将要送出本节点的报文

  1. 送往table 20
  2. 送往table 22

table 4: 从其他节点送到本节点的报文

tun_id为15(0xf)的报文,打上vlan标签1,送往table 10

即从vxlan的tunnel出来的报文,如果tunnel id是15,则报文设置vlan为1

qvo02407769-97(与vm-1相连)和tap17a89323-fd(DHCP的tap接口)的vlan tag正是1

table 10

learn(table=20,hard_timeout=300,priority=1,cookie=0x9c915752f14d2544,NXM_OF_VLAN_TCI[0…11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:“patch-int”

这个action是如此的复杂,但不看learn()中的内容,报文最终被送往了br-int

learn则是在table20中增加对于回程报文的转发规则

table 20

送往table 22

table 22

VLAN tag为1的报文,去掉VLAN,设置tun_id为15(0xf)后,送往vxlan-ac0a000b

综上:

  1. 从vm-1送来的报文,携带VLAN tag为1,去掉VLAN tag,设置tun_id为15,从vxlan接口送出
  2. 从其他节点送来的报文,如果tun_id为15,设置VLAN tag为1,送往br-int

总结

完整的分析了node0上的流表,node1上的流表内容基本相似,就不再展开。

至此,跨节点的虚拟机相互访问的实验及分析正式完结。你会发现

  1. 每一个port在连接到虚拟机的时候,都创建了一个网桥
  2. 每个虚拟机连在br-int上的接口,都按照subnet分配了一个VLAN tag,且每个节点的并不相同
  3. 当虚拟机的报文要送出/进入当前节点时,会有VLAN tag和VXLAN的tun_id相互转换

现在,思考一下:

  1. node1上的vm-1是如何通过DHCP获取IP地址的?
  2. 为什么虚拟机不直接连在br-int上,而是要通过一个linux bridge连接到br-int上呢?