Is there a big difference in terms of the implementation between
__cudaUnregisterFatBinary() (NVCC 10.2) and
__cudaUnregisterFatBinary() (NVCC 10.0).
My program segfaults at
__cudaUnregisterFatBinary() (NVCC 10.2) inside libcudart.so, and works just fine with the 10.0 version. I am calling the
cudaRegisterFatBinary() from different .cpp files (I am doing some function interposition - aka wrapping CUDA calls). And also I am making sure to pass the correct arguments (void **fatCubinHandle). Any thoughts if there’s a huge difference between the two nvcc version that might make my app fail!
The only solution I have left is to disable this call, once inside the cudaUnregisterFatBinary() wrapper, I just exit before calling the real cudart implementation! Could that affect the GPU device memory?