Nsight 4.1 debugging fails with error "failed to allocate code patching memory from driver&quot

I’m working on an OpenCV application linked with CUDA 6.5. I’m using Nsignt 4.1.0.14255. I get the above error when attempting to start debugging. The OpenCV application also gets a “device out of memory” error.

Nsight debugger output is here: https://dl.dropboxusercontent.com/u/64023107/Nsight.txt

The GPU is a GeForce 520 GT 512MB, device info: https://dl.dropboxusercontent.com/u/64023107/deviceQuery520.txt

I have not seen reports of this error before. Any idea what is going on?

Thanks,
Albert

hi ayc8,
As 520 GT only has 512M, debugging also has some overhead, Please try on an more powerful GPU to debug your app, If any further issue, come here again.

victor

Sorry, this is an embedded platform and 520 GT is the only GPU I have available. Are you sure there is no issue with the toolkit? Using CUDA 4.2 and Nsight 3.1, I had no issues debugging on NVidia ION, which is an even more limited GPU.

Thanks,
Albert

there is no known bug in Nsight that would cause this, and he needs to either reduce the memory footprint of the app or try a larger video card.
Debugging a GPU application can take several times as much video memory as just running the application.
You try turning off memory checking (if it’s on) in the debug sessions; this would reduce the memory footprint somewhat. And double-check that you don’t have a resource leak in his app.

victor

On my machine NSIGHT cannot start at all.

2 GB of VRAM is not enough for the whole .cu kernels to load there.

How to implement delayed loading of these kernels?

Seems that if you have a dll with cuda code inside on launch all cuda code goes into the vram.
Particularly for opencv you can workaround to compile static libraries only.

This way it will cut down unused kernels.

I am using the stiching example in opencv 3.0 with static libraries my vram went to 1GB.