TX2-Jetpack3.3 -TensorRT- buildCudaEngine fails

I have two deep network structure, both using TensorRT and having IPlugins in them. They were working well when I was using Jetpack 3.1 on TX2. I have upgraded it to Jetpack 3.3 without installing Opencv and Visionworks, and then installed OpenCV3.4(It is unrelated to the networks but needed for my project). Now one of the networks working well again, but the other one returns error during buildCudaEngine method of TensorRT. Details of the error shown below:

[GIE] Original: 29 layers
[GIE] After dead-layer removal: 29 layers
[GIE] After scale fusion: 29 layers
[GIE] Fusing fc6-1 with relu6
[GIE] Fusing fc7 with relu7
[GIE] After vertical fusions: 27 layers
[GIE] After swap: 27 layers
[GIE] After final dead-layer removal: 27 layers
[GIE] After tensor merging: 27 layers
[GIE] After concat removal: 27 layers
[GIE] Graph costruction and optimization completed in 0.00105677 seconds.
[GIE] cudnnEngine.cpp (87) - Cuda Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
[GIE] cudnnEngine.cpp (87) - Cuda Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
[GIE] failed to build CUDA engine

I don’t know whether this is a problem of TensorRT or cuDNN or TX2, but I post it here.
I would appreciate any help if anyone knows whether there is a solution to this or the meaning and reason of the error so that I can check stuffs on my code.

Thanks in advance.

Hello,

Cuda error 4 ususally points to a memory dereference error.

An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. All existing device memory allocations are invalid. To continue using CUDA, the process must be terminated and relaunched.

Consider trying example code to rule out jetpack3.3 installation/config issues? (Unlikely, since one of your networks already works with jetpack3.3). Also review your code for possible dereference issues.

If you continue to have this errors, please share a small repro package (containing source, model, dataset) which would help us debug.

Thanks for the reply, I will review my code and try the example code as well. I’ll let you know if I can solve the issue.

Hi again NVES,

I’ve solved the issue. I was using thrust::copy to transfer the weights to plugin layer. I realized that without using thrust::copy I don’t get any error. Therefore, instead of thrust::copy, using cudaMemcpy solved my issue. However, I still don’t know why thrust::copy does cause error on Jetpack3.3 but not on Jetpack 3.1.

Thanks again.