NVOF Cuda context

Our application uses NvDec, NvOf together with TensorRT based inference on RTX4000. Upon recently upgrading to Cuda 11 from 10.2 we started experiencing ‘random’ cuda errors. After some investigating we found this was related to the use of NVOF: stability returns if we replace the use of NvOf with an alterative means for generating the optical flow.

In our code we have one single cuda context for the whole application, including NvOf. Interestingly, if we create NvOf with it’s own cuda context, then everything is stable again. Even though this means device-to-device copies between our ‘main’ context and the NvOf context (which I thought should not be allowed?). On the other hand although it now runs stably, the entire application then runs ~15% slower compared to the unstable cuda11 or our previous customer shipping stable cuda 10.2 same context case. This is also puzzling considering the NvOf part of our app is quite small (much less that 15% of the GPU time).

My question for this forum is what is the intended correct use of NvOf, is it intended that you must use a separate cuda context? I have looked carefully at the docs & SDK and can’t see anything clear.

Thanks for any thoughts on this!