We have a multi-threaded program that runs several image processing operations (including DNN networks) in parallel. In order to ensure that the thread runs on the right GPU device, each thread tries to cudaSetDevice before running its operations. In one of those threads, we try to load a TensorRT engine by using deserializeCudaEngine API but it crashes with following error:
Cuda error in file src/implicit_gemm.cu at line 648: cannot set while device is active in this process
nodelet: customWinogradConvActLayer.cpp:280: virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `convolutions.back().get()’ failed.
When I say crash, I mean the entire process crashes without giving us a chance to deal with this failure.
After debugging, I figured that this seems to be caused by a cudaSetDevice call by another thread at the same time. If I run this tensorrt initialization in a different process, it works without any issues. Why is deserializeCudaEngine so sensitive to cudaSetDevice? Is there anything that I can do to stop it from crashing?