atomicAdd problems.

Hi all,

I have problems with atomicAdd function (unsigned integer version).
I have a buffer in global memory, that is actually big 2d array of counters. It’s quite big 1024x1024.
Each running thread does some work and at the very end it increments one of the counters in that array. The index that is incremented by each thread is depended by the input data, and actually it’s chaotic and doesn’t relate to the thread index. There can be situation that few threads will increment the same counter, but mostly - each thread will increment it’s own.

Everything works perfect on the GF 400, but I get problem with GF9000 and GF200 (didnt tried on GF8000 cause we dont support it). The problem is - kernel launch fails with “unspecified launch failure” message. If I comment this last line with atomicAdd - it works. Kernel doesnt contain any loops, and it’s quite fast.

Can there be any problems related to the fact that each thread increments a random address of memory at the atomicAdd?

Any ideas?

Thanks.

How are you compiling? I’d guess that you are generating code only for compute capability 2.x.

No I compile for 1.1, 1.3 and 2.0. And actually I have a lot of kernels in this program that are working on the GF200 and GF9000. Including kernels that have atomicAdd functions.

Here is my command line (grabbed from VS).

“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\bin\nvcc.exe” -gencode=arch=compute_11,code="sm_11,compute_11" -gencode=arch=compute_13,code="sm_13,compute_13" -gencode=arch=compute_20,code="sm_20,compute_20" --machine 32 -ccbin “C:\Program Files\Microsoft Visual Studio 9.0\VC\bin” -DWIN32 -DNDEBUG -D_WINDOWS -D_USRDLL -DFILEVOXELSOURCE_EXPORTS -DUNICODE -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MD " -I”“W:\Scopic\Voxel\trunk\Dependencies\Include”" -I"“W:\Scopic\Voxel\trunk\GigaVoxelsRender\Include”" -I"“W:\Scopic\Voxel\trunk\Dependencies\Include\WmFoundation”" -I"“W:\Scopic\Voxel\trunk\Dependencies\Include\Cuda”" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\include" -maxrregcount=32 --ptxas-options=-v --compile -o “Win32\Release/FileVoxelLib.vcproj.obj” “w:\Scopic\Voxel\trunk\FileVoxelLib\FileVoxelLib.vcproj”

Well, I found that problem doesnt relate to atomicAdd. It relates to the fact of memory violation. For some reasons on my GF460 I get it silent working, but on other cards - it fails.