Update to this: as a further check, I tried blocking all signals in our app. This results in zero deadlocks (overnight test), so there is a pretty strong case suggesting the above signal handler is indeed the cause of our deadlocks.
Although blocking signals will stop the deadlocks, it’s not really a practical solution - it would be better to fix the underlying signal handling (in the Nvidia drivers?) to prevent more clients having the same problem.
I will attempt to find out which signal is being raised, in the meantime it’d be great to get some feedback from Nvidia or elsewhere on whether my assessment of the deadlocks is reasonable.
Many thanks.