Hooking into CUDA calls

I am trying to intercept cudaMemcpy calls from the pytorch library for analysis. I noticed NVIDIA has a cuHook example in the CUDA toolkit samples. However that example requires one to modify the source code of the application itself which I cannot do in this case. So is there a way to write a hook to intercept CUDA calls without modifying the application source code?