PNF failed after FIP detached
Description
Environment
ODL:Nitrogen-SR1 3 nodes
OpenStack: Pike 3 nodes (1 controller, 2 compute)
Attachments
Activity

Vinh.Nguyen March 29, 2018 at 12:12 AMEdited
sorry the previous analysis is incorrect.
Revised analysis:
I found that the problem ocurs only when:
NAT conntract mode is used.
The deleted FIP VM is on the NAPT switch.
The vpn-to-dpn-list for the external subnet on each compute dpn contains single IP address -
the FIP. The external router GW interface is contained in the vpn-to-dpn-list on the control node.
If the FIP on the NAPT switch is deleted, since it is the last address on the
vpn-to-dpn-list. fibManager.cleanUpDpnForVpn is invoked and the PNF flows will be
removed from the dpn as a result.
NAPT controller mode doesn't have this issue because the external router GW interface IP is
contained in the vpn-to-dpn-list of the NAPT switch. Thus deleting the last FIP will not
invoke fibManager.cleanUpDpnForVpn since the router GW interface IP still exists in the dpn.
The following is the vpn-instance-op-data-entry for the external subnet when FIPs added:
Conntrack mode:
From the log for SandboxJobs/job1, the dpnid of nodes are:
control: 223071002466895, 189fdbb1-eab1-4108-9b2a-bff343503552: external router gw interface
compute1: 73535277218113 - NAPT switch, FIP: 10.10.10.13
compute2: 116882536471118 - non NAPT switch, FIP: 10.10.10.4
}
JamO Luhrsen March 13, 2018 at 12:01 AM
reading the commit message in the patch makes sense for why we
could lose connectivity (flow removed), but I think the connectivity does eventually return. What is making that
happen?
commit message:
Problem:
Deleting last FIP port on dpn also deleting the PNF flow
entries ion the OVS node.
Solution:
Don't invoke fibManager.cleanUpDpnForVpn (which removes
the PNF flows) when last port on external subnet vpn is
deleted on the dpn.

Vinh.Nguyen March 12, 2018 at 10:18 PM

Vinh.Nguyen March 12, 2018 at 8:03 PM
Update title to 'PNF failed after FIP Detached'
Reason: Based on the attched CSIT report, the SNAT TCP/UDP connection verification passed,
the failure was in PNF verification after FIP detached
Investigation:
Three nodes, control, compute1, compute2
1.External NW creation
2.Internal NW creation
3.Router creation and GW/IF setting
4.VMs creation: VM1 on compute1 node, VM2 on compute2 ndoe
The PNF SubnetRoute flow entries are installed for ALL 3 nodes
cookie=0x8000003, duration=339.441s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x30d42/0xfffffe,nw_dst=10.10.10.0/24 actions=write_metadata:0x138c030d42/0xfffffffffe,goto_table:22
cookie=0x8000004, duration=339.441s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x30d42/0xfffffe,nw_dst=10.10.10.255 actions=drop
cookie=0x8000004, duration=902.422s, table=22, n_packets=0, n_bytes=0, priority=0 actions=CONTROLLER:65535
5.SNAT confirmation
6.FIPs attach
7.DNAT confirmation
8.FIPs for VMs detach
9.VM pings external PNF instances: FAILED
Problem:
The PNF subnetRoute flow entries are removed on the OVS node that hosted the VM after detaching the FIP (in step 8).
Hence traffic from VM on that OVS node to the PNF instance is no longer possible.
The PNF subnetRoute flow entries are removed on the dpn when the FIP port is the last port for the VPN on that dpn:
The VpnToDpnList for the external subnet after FIP is attached (step 6):
{
"vpn-id": 100012,
"vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
"vpn-state": "created",
"vpn-to-dpn-list": [
{
"dpn-state": "active",
"dpnId": 8796751999625,
"ip-addresses": [
{
"ip-address": "192.168.56.18/32",
"ip-address-source": "ExternalFixedIP"
},
{
"ip-address": "192.168.56.13/32",
"ip-address-source": "FloatingIP"
}
]
},
{
"dpn-state": "active",
"dpnId": 8796748560798,
"ip-addresses": [
{
"ip-address": "192.168.56.17/32",
"ip-address-source": "FloatingIP"
}
]
}
],
"vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
}
Notes:
Two compute nodes, dpnid: 8796751999625, 8796748560798
Ports on 8796751999625:
+ 192.168.56.18: router external GW interface
+ 192.168.56.13: FIP for VM1Ports on 8796748560798:
+ 192.168.56.17: FIP for VM2
The VpnToDpnList for the external subnet after FIP is deleted (step 8):
{
"vpn-id": 100012,
"vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
"vpn-state": "created",
"vpn-to-dpn-list": [
{
"dpn-state": "active",
"dpnId": 8796751999625,
"ip-addresses": [
{
"ip-address": "192.168.56.18/32",
"ip-address-source": "ExternalFixedIP"
}
]
},
{
"dpn-state": "inactive",
"dpnId": 8796748560798
}
],
"vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
}
After detaching the FIP on 8796748560798 the vpn-to-dpn-list for external subnet vpn is empty,
fibManager.cleanUpDpnForVpn is called to clean up the PNF flow entries.
Suggested solution:
Method FibManager.cleanUpDpnForVpn cleans up flow entries associating with the one VPN such as
SubnetRoute, BroadCast,etc. For internal VPN, these flow entries are created for internal VPN
when at least one VPN interfaces exists on the VPN and should be removed when the last VPN
interface are removed.For external subnet VPN, the flow entries mentioned above are created when the subnet is created.
Therefore, when deleting last VPN interface on external subnet VPN, simply remove VpnToDpnList
associated with the VPN. The cleanup DPN for external subnet VPN will be done when the external
subnet is deleted.

Vinh.Nguyen March 10, 2018 at 12:56 AM
This issue is not related to the PNF/SNAT issue in recent CSIT. The issue is found in sandbox where extra tests are added to the end of the current external-network test cases. The additional test cases are:
Delete the FIP for VM instance1: PASS
SNAT TCP connection to External Gateway From VM Instance1 : PASS
SNAT UDP connection to External Gateway From VM Instance1 : PASS
Ping External Network PNF from Vm Instance 1: FAIL
Here, PNF ping fails when the FIP is deleted. We would expect PNF scenario continues to work via SNAT
Details
Details
Assignee

Reporter

The reproduction steps .
1.External NW creation
2.Internal NW creation
3.Router creation and GW/IF setting
4.VM creation
5.SNAT confirmation
OK
6.FIP attach
7.DNAT confirmation
OK
8.FIP detach
9.SNAT confirmation
NG