“better” testing results are not “Better” if the underlying issue still exists and the error condition can still happen, which it does with regard to your auto ISP functionality. The only thing you have successfully done is covered up and issue so that it happens more irregularly in office-testing but can still potentially happen in a production scenario.
You all keep focusing on the wrong problem. The high CPU load is not the problem. It is an accelerant. The problem still exists and is causing a lock-up in the ISP codebase. We are currently working with a “partner” of Nvidia’s on this issue and they are also not getting much support because of this problem.
As I have said before, we are in the process of changing our hardware so that we do not have to use the argus API for ISP control. This is an absolute shame because the rest of the functionality Nvidia provides is very high quality. However, we absolutely cannot have programs that segfault or die in any way under the covers without proper error recovery.