Hello,
My code is crashing with the error in the title on a cudaMalloc.
This is strange to me because 1) its not a kernel, and 2) the amount of memory is far less than the amount of free on the device. I checked the free space right before this call, and that operation succeeds, which is how I know there is plenty of space.
This only started happening after upgrading to CUDA 4.0 earlier today.
Does anyone have any suggestions on what to look for?