CUDA on first memory allocation call only weird issue

I need help trying to find out why I get an error when I allocate GPU memory to the first pointer in this series of memory allocations

cudaMalloc((void**)&md_pfDemX, sizeof(float) * iW * iH);
cudaMalloc((void**)&md_pfDemY, sizeof(float) * iW * iH);
cudaMalloc((void**)&md_pfDemZ, sizeof(float) * iW * iH);

When stepping over the first line in debug in VS2008 I get the following in the output window.

First-chance exception at 0x74d39617 in SensorView.exe: Microsoft C++ exception: cudaError_enum at memory location 0x001bf998…
First-chance exception at 0x74d39617 in SensorView.exe: Microsoft C++ exception: cudaError_enum at memory location 0x001bf994…
First-chance exception at 0x74d39617 in SensorView.exe: Microsoft C++ exception: cudaError at memory location 0x001bf9e0…

md_pfDemX still has the value 0xCCCCCCCC

Stepping over subsequent lines indicates no further errors. And memory appears to be successfully allocated for these subsequent calls and kernels not using md_pfDemX execute as expected.

I can temporarily get around this by repeating the first line. But what would be causing this issue?

System:

Windows 7 32Bit 4GB RAM, GTX580(Headless) & GTS250. Executing code on GTX580 compiled for 2.0 Architecture, VS2008. CUDA4.0 driver Ver 270.81.

What do You call before the line?

Regards,
MK

I call:

cudaSetDevice(0);

CUresult CuError = cuInit(0);

Immediately before the memory allocation. I was mistaken when I thought they were running after closer inspection.

I have since solved the problem however. In VS2008 I can select custom build rules. The CUDA ones are as follows:

  1. CUDA Driver API Build Rule

  2. CUDA Driver API Build Rule (v3.2)

  3. CUDA Driver API Build Rule (v4.0)

  4. CUDA Runtime API Build Rule

  5. CUDA Runtime API Build Rule (v3.2)

  6. CUDA Runtime API Build Rule (v4.0)

I have 6 selected initally. Changing this to build rule 4 and recompiling has fixed the issue. Inspecting the command line output in the .cu file properties indicates that the v3.2 compiler is now being called and not 4.0 as before.

Command line output from Build Rule 4.

“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe” -gencode=arch=compute_10,code="sm_10,compute_10" -gencode=arch=compute_20,code="sm_20,compute_20" --machine 32 -ccbin “C:\Program Files\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /Od /Zi /MTd " -I”…\GPU" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -maxrregcount=0 --compile -o “Nexus Debug/initialMapGen.cu.obj” “c:\Dev\04-Real-time-ASP4-WORKING\ASP-GPU\Trunk\SensorView\GPU\initialMapGen.cu”

Command line output from Build Rule 6.

“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe” -gencode=arch=compute_10,code="sm_10,compute_10" -gencode=arch=compute_20,code="sm_20,compute_20" --machine 32 -ccbin “C:\Program Files\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /Od /Zi /MTd " -I”…/GPU" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o “Nexus Debug/initialMapGen.cu.obj” “c:\Dev\04-Real-time-ASP4-WORKING\ASP-GPU\Trunk\SensorView\GPU\initialMapGen.cu”

So the code works fine with build rule 4, but not 6. What would cause this? I am fairly sure I downloaded all the correct files for CUDA 4.0 Development. Maybe I should try reinstalling?

Any Advice and comments would be much appreciated

Cheers

Matt

The first bit of my last post should read:

I call:

cudaSetDevice(0);
CUresult CuError = cuInit(0);

Immediately before the memory allocation. I was mistaken when I thought the kernels were running after closer inspection. I could not debug them with Nsight and another memory error occurred upon the kernel being called