Deadlock risk with Session Lock & KeepaliveTask Synchronization

Description

If KeepaliveSalFacade tried to send a keepalive RPC when the NETCONF channel is disconnected, this may lead to a deadlock in the following scenario:

  1. Thread #1
    NetconfDeviceCommunicator#onSessionDown
    -> NetconfDeviceCommunicator#teardown sessionLock.lock()
    -> onRemoteSessionDown()
    -> KeepaliveSalFacade#onDeviceDisconnected
    -> stopKeepalives()
    -> KeepaliveTask#disableKeepalive synchronized KeepaliveTask

  2. Thread #2
    KeepaliveTask#sendKeepalive synchronized KeepaliveTask
    -> NetconfDeviceRpc#invokeNetconf
    -> NetconfDeviceDOMRpcService#invokeRpc
    -> NetconfDeviceCommunicator#sendRequest sessionLock.lock()

Karaf logs:

  1. Thread #1

  2. Thread #2

Environment

None

Activity

Show:

Peter Šuňa yesterday

Based on task description, two threads are blocked:

  1. NetconfDeviceCommunicator#teardown holds sessionLock and waits for KeepaliveTask#sendKeepalive to finish.

  1. KeepaliveTask#sendKeepalive holds the KeepaliveTask lock and waits for NetconfDeviceCommunicator#teardown to release sessionLock in NetconfDeviceCommunicator#sendRequest method.

The KeepaliveTask behavior is correct because:
KeepaliveTask#disableKeepalive should be executed when KeepaliveTask#sendKeepalive is finished, and vice versa. There is no option for any part of either method to be executed while the other method is not finished yet.

On the other hand, in NetconfDeviceCommunicator, the blocked methods are:
NetconfDeviceCommunicator#sendRequest and NetconfDeviceCommunicator#tearDown

These could be divided, and some parts of the code may be executed while the other method is running. This leads to the proposed solution to divide NetconfClientSession and RemoteDevice execution, which could be done in parallel. This will unblock execution of NetconfDeviceCommunicator#sendRequest method.

Details

Assignee

Reporter

Labels

Components

Affects versions

Priority

Created last month
Updated yesterday