PNF failed after FIP detached

Description

The reproduction steps .
1.External NW creation
2.Internal NW creation
3.Router creation and GW/IF setting
4.VM creation
5.SNAT confirmation
OK
6.FIP attach
7.DNAT confirmation
OK
8.FIP detach
9.SNAT confirmation
NG

Environment

ODL:Nitrogen-SR1 3 nodes
OpenStack: Pike 3 nodes (1 controller, 2 compute)

Attachments

3

Activity

Vinh.Nguyen March 29, 2018 at 12:12 AM
Edited

sorry the previous analysis is incorrect.

Revised analysis:

I found that the problem ocurs only when:

  • NAT conntract mode is used.

  • The deleted FIP VM is on the NAPT switch.

The vpn-to-dpn-list for the external subnet on each compute dpn contains single IP address -

the FIP. The external router GW interface is contained in the vpn-to-dpn-list on the control node.

If the FIP on the NAPT switch is deleted, since it is the last address on the
vpn-to-dpn-list. fibManager.cleanUpDpnForVpn is invoked and the PNF flows will be
removed from the dpn as a result.

NAPT controller mode doesn't have this issue because the external router GW interface IP is
contained in the vpn-to-dpn-list of the NAPT switch. Thus deleting the last FIP will not
invoke fibManager.cleanUpDpnForVpn since the router GW interface IP still exists in the dpn.

The following is the vpn-instance-op-data-entry for the external subnet when FIPs added:

Conntrack mode:
From the log for SandboxJobs/job1, the dpnid of nodes are:

  • control: 223071002466895, 189fdbb1-eab1-4108-9b2a-bff343503552: external router gw interface

  • compute1: 73535277218113 - NAPT switch, FIP: 10.10.10.13

  • compute2: 116882536471118 - non NAPT switch, FIP: 10.10.10.4

 

}

JamO Luhrsen March 13, 2018 at 12:01 AM

reading the commit message in the patch makes sense for why we
could lose connectivity (flow removed), but I think the connectivity does eventually return. What is making that
happen?

commit message:

Problem:
Deleting last FIP port on dpn also deleting the PNF flow
entries ion the OVS node.
Solution:
Don't invoke fibManager.cleanUpDpnForVpn (which removes
the PNF flows) when last port on external subnet vpn is
deleted on the dpn.

Vinh.Nguyen March 12, 2018 at 10:18 PM

Patch: 

https://git.opendaylight.org/gerrit/#/c/69102/

 

CSIT verification pending.

Vinh.Nguyen March 12, 2018 at 8:03 PM

Update title to 'PNF failed after FIP Detached'
Reason: Based on the attched CSIT report, the SNAT TCP/UDP connection verification passed,
the failure was in PNF verification after FIP detached

Investigation:

Three nodes, control, compute1, compute2

1.External NW creation
2.Internal NW creation
3.Router creation and GW/IF setting
4.VMs creation: VM1 on compute1 node, VM2 on compute2 ndoe

The PNF SubnetRoute flow entries are installed for ALL 3 nodes

cookie=0x8000003, duration=339.441s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x30d42/0xfffffe,nw_dst=10.10.10.0/24 actions=write_metadata:0x138c030d42/0xfffffffffe,goto_table:22
cookie=0x8000004, duration=339.441s, table=22, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x30d42/0xfffffe,nw_dst=10.10.10.255 actions=drop
cookie=0x8000004, duration=902.422s, table=22, n_packets=0, n_bytes=0, priority=0 actions=CONTROLLER:65535

5.SNAT confirmation
6.FIPs attach
7.DNAT confirmation
8.FIPs for VMs detach
9.VM pings external PNF instances: FAILED

Problem:

The PNF subnetRoute flow entries are removed on the OVS node that hosted the VM after detaching the FIP (in step 8).
Hence traffic from VM on that OVS node to the PNF instance is no longer possible.

The PNF subnetRoute flow entries are removed on the dpn when the FIP port is the last port for the VPN on that dpn:

https://github.com/opendaylight/netvirt/blob/master/vpnmanager/impl/src/main/java/org/opendaylight/netvirt/vpnmanager/VpnFootprintService.java#L352

The VpnToDpnList for the external subnet after FIP is attached (step 6):
{
"vpn-id": 100012,
"vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
"vpn-state": "created",
"vpn-to-dpn-list": [
{
"dpn-state": "active",
"dpnId": 8796751999625,
"ip-addresses": [
{
"ip-address": "192.168.56.18/32",
"ip-address-source": "ExternalFixedIP"
},
{
"ip-address": "192.168.56.13/32",
"ip-address-source": "FloatingIP"
}
]
},
{
"dpn-state": "active",
"dpnId": 8796748560798,
"ip-addresses": [
{
"ip-address": "192.168.56.17/32",
"ip-address-source": "FloatingIP"
}
]
}
],
"vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
}

Notes:

  • Two compute nodes, dpnid: 8796751999625, 8796748560798

  • Ports on 8796751999625:
    + 192.168.56.18: router external GW interface
    + 192.168.56.13: FIP for VM1

  • Ports on 8796748560798:
    + 192.168.56.17: FIP for VM2

The VpnToDpnList for the external subnet after FIP is deleted (step 8):

{
"vpn-id": 100012,
"vpn-instance-name": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761",
"vpn-state": "created",
"vpn-to-dpn-list": [
{
"dpn-state": "active",
"dpnId": 8796751999625,
"ip-addresses": [
{
"ip-address": "192.168.56.18/32",
"ip-address-source": "ExternalFixedIP"
}
]
},
{
"dpn-state": "inactive",
"dpnId": 8796748560798
}
],
"vrf-id": "ddf97de4-0a2d-48a8-b7d3-af8ffdae6761"
}

After detaching the FIP on 8796748560798 the vpn-to-dpn-list for external subnet vpn is empty,
fibManager.cleanUpDpnForVpn is called to clean up the PNF flow entries.

https://github.com/opendaylight/netvirt/blob/master/vpnmanager/impl/src/main/java/org/opendaylight/netvirt/vpnmanager/VpnFootprintService.java#L354

Suggested solution:

  • Method FibManager.cleanUpDpnForVpn cleans up flow entries associating with the one VPN such as
    SubnetRoute, BroadCast,etc. For internal VPN, these flow entries are created for internal VPN
    when at least one VPN interfaces exists on the VPN and should be removed when the last VPN
    interface are removed.

  • For external subnet VPN, the flow entries mentioned above are created when the subnet is created.

  • Therefore, when deleting last VPN interface on external subnet VPN, simply remove VpnToDpnList
    associated with the VPN. The cleanup DPN for external subnet VPN will be done when the external
    subnet is deleted.

Vinh.Nguyen March 10, 2018 at 12:56 AM

This issue is not related to the PNF/SNAT issue in recent CSIT. The issue is found in sandbox where extra tests are added to the end of the current external-network test cases. The additional test cases are:

 

  • Delete the FIP for VM instance1: PASS

  • SNAT TCP connection to External Gateway From VM Instance1 : PASS

  • SNAT UDP connection to External Gateway From VM Instance1 : PASS

  • Ping External Network PNF from Vm Instance 1: FAIL

Here, PNF ping fails when the FIP is deleted. We would expect PNF scenario continues to work via SNAT

 

Done

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created February 6, 2018 at 12:30 PM
Updated May 15, 2018 at 2:18 PM
Resolved May 15, 2018 at 2:18 PM