CUDA profiling with shared libraries Profiler + cuda in .so files

I am trying to profile some cuda code that is wrapped in a shared library. Setting the CUDA_PROFILE environment variable does not produce cuda_profile.log in the working directory in that case. Is there a way to profile CUDA calls make inside shared libraries (.so) ? It would be good if I can atleast know that the CUDA calls went through. Sometimes, I only know that the kernel failed to execute on the GPU from the cuda_profile.log file…

personally I use CUDA from MATLAB. So matlab uses a shared library that calls CUDA. When running matlab from the visual profiler I get profiles just fine. You have to be careful that the profile might be in a different directory when your program changes directories.