Cuda application segfaulting at __cudaUnregisterFatBinary() (NVCC 10.2) versus __cudaUnregisterFatBinary() (NVCC 10.0)

Is there a big difference in terms of the implementation between __cudaUnregisterFatBinary() (NVCC 10.2) and __cudaUnregisterFatBinary() (NVCC 10.0).

My program segfaults at __cudaUnregisterFatBinary() (NVCC 10.2) inside libcudart.so, and works just fine with the 10.0 version. I am calling the cudaUnregisterFatBinary() and cudaRegisterFatBinary() from different .cpp files (I am doing some function interposition - aka wrapping CUDA calls). And also I am making sure to pass the correct arguments (void **fatCubinHandle). Any thoughts if there’s a huge difference between the two nvcc version that might make my app fail!

The only solution I have left is to disable this call, once inside the cudaUnregisterFatBinary() wrapper, I just exit before calling the real cudart implementation! Could that affect the GPU device memory?