Segfault at exit when __cudaUnregisterFatBinary is called

I am wrapping ALL cuda calls, and __cudaRegisterFatBinary and __cudaUnregisterFatBinary as well.
This is because I load 2 cuda libraries to the process’ memory, one library is idle and I am targetting the other one!
All the calls are working except the __cudaUnregisterFatBinary call!
I get a segfault when this call is invoked! I make sure to store in a global variable the fatCubinHandle returned by __cudaRegisterFatBinary, so when __cudaUnregisterFatBinary is called I make sure in the wrapper (which is called first) to pass the correct argument (the fatCubinHandle that I saved) to __cudaUnregisterFatBinary!
This does not work and it is frustrating, if you can help that would be much appreciated!
I am afraid if the __cudaUnregisterFatBinary inside libcudart.so calls another function that restores the faulty fatCubinHandle.
Thanks for your help!

[Solved]: For curious people, this was solved by calling cudaRegisterFatBinaryEnd after the cudaRegisterFatBinary.
Some cuda runtime versions require this call!!!