CUDA code runs fine on Mac, but not on Windows

I’m trying to run a numeric integration code on two different machines: my Mac laptop with a GeForce GT 650M (compute capability 3.0), and a Windows 7 desktop with a GeForce GTX 580 (compute capability 2.0). Both have CUDA 5.0 installed. The code used to run correctly on the Windows machine, but it doesn’t any longer and I don’t remember making any significant changes. The Windows machine returns garbage on the first CUDA kernel call, and I’ve been unsuccessful thus far in my attempts to find out why since the exact same thing runs without issue on the Mac. I’m pretty new to CUDA development, so I would really appreciate any ideas regarding what kind of things I should be looking at to diagnose this. I could post the full code, but it’s ~600 lines (plus a MATLAB driver that calls it) and I feel like it’s likely a problem with the system/compilation and not the code.

Under windows, you could try debugging via NSight (via attaching to the matlab process, I’ve been able to do this in the past). My first suggestion would be to disable WDDM TDR in the Windows machine, in conjunction to error checking your kernel calls to see what sort of errors you get. From CC3.0 to CC2.0 you could be exploiting some CC3.0 features without noticing it and the GPU on the windows box probably isn’t liking it… just some thoughts.

urg. there seems to just be some sort of memory corruption issue with matlab. after i shut the machine down and restart it then it will work again (until the next time it happens).

Make sure you destroy both your matlab and CUDA arrays after you’re done with them. mex files by default do not clear their own memory unless you explicitely destroy the variables used… same goes for CUDA. Use cudaFree() for any CUDA vars and mxFree / mxDestroyArray as appropriate for your MATLAB variables that are unneeded before returning from the mex file.

I know I could easily bork my GPU if I didn’t do this on a previous program that used somewhere in the neighborhood of ~2GB VRAM.