Trouble with Histogram Function Memory access

I am having trouble with a simple function that is trying to perform a histogram of an image. If I run this function in EmuDebug mode then all is well. But if I run in Debug mode it exits with the following errors

First-chance exception at 0x7c812a5b in cudaUrban.exe: Microsoft C++ exception: cudaError_enum @ 0x0012ca74.
First-chance exception at 0x7c812a5b in cudaUrban.exe: Microsoft C++ exception: cudaError_enum @ 0x0012ca74.
The thread ‘Win32 Thread’ (0xab8) has exited with code 1 (0x1).
The thread ‘Win32 Thread’ (0xd40) has exited with code 1 (0x1).
The thread ‘Win32 Thread’ (0x228) has exited with code 1 (0x1).
The thread ‘Win32 Thread’ (0xdc8) has exited with code 1 (0x1).
First-chance exception at 0x7c812a5b in cudaUrban.exe: Microsoft C++ exception: cudaError_enum @ 0x0012c77c.

The function is as follows

global void
kernel_MakeHistogram( kz_pixel_t* pImage,
unsigned int xIndex,
unsigned int yIndex,
unsigned int uiXRes,
unsigned int uiSizeX,
unsigned int uiSizeY,
kz_pixel_t* pLookupTable,
unsigned long threadHistogram)
/
This function classifies the greylevels present in the array image into

  • a greylevel histogram. The pLookupTable specifies the relationship
  • between the greyvalue of the pixel (typically between 0 and 4095) and
  • the corresponding bin in the histogram .
    */
    {
    const unsigned int ty = threadIdx.y;
    const unsigned int tx = threadIdx.x;

unsigned int x = xIndexuiSizeX + blockIdx.xblockDim.x + tx;
unsigned int y = (yIndexuiSizeY + blockIdx.yblockDim.y + ty)*uiXRes;
unsigned int gmemPos = x + y;

kz_pixel_t tVal = (kz_pixel_t) pImage[gmemPos];
if(tVal > 4095) tVal = 4095;
const kz_pixel_t tempVal = pLookupTable[tVal];
const unsigned int lutIndex= (unsigned int) (ty+1)*tempVal;
threadHistogram[lutIndex]++;

}

The problem seems to be in the last line

threadHistogram[lutIndex]++;

If I put a fixed value as index then all works well. I have checked to see that I
am not indexing out of range.

I was wondering maybe the error code may shed some light on the problem.

Are these error codes explained in any document?

Thanks,

Imran

Are you using CUDA in more than 1 host threads? This is infeasible due to host threads having different memory spaces.

I am not using separate host thread but I have allocated enough memory so that my 16 threads can each work on their portion. But your comment seems to suggest that I cannot share threadHistogram memory array between threads. What do you propose that I do?

One host thread doing all CUDA stuff and message passing.

You have threads writing to the same memory location. This can give problems, even if you have version 0.9 and up (atomic operations). These operations are only available for 32 bits ints.

Thanks for the response. I would like to try atomic functions. But when I implement

the problem line using atomicAdd

atomicAdd(&threadHistogram[lutIndex],1);

The compiler says undefined function. I added -arch sm_11 to the command line but that did not help either. I am using windows XP and 8800 GTX with latest sdk.

The 8800 GTX is only a compute 1.0 device, it cannot perform atomic adds. But, the algorithmic problem with your kernel is not what is causing your memory errors. As others have stated, it is because you are accessing GPU memory from a different thread than allocated it.

I meet the same problem, and can you tell me how to add -arch sm_11 to the command line? I find the definition file of atomicAdd which is sm_11_atomic_functions.h, and I include it in the program, but that did not get any help. why this accurs?

I am using Visual Studio 2003 and I just added it to the comand line. Atomic functions I am told are only supported on arch 1.1. Since I am using 8800 GTx this operation is not supported. Also as suggested my problem can be solved differently. If you are wanting to perform histogram it would be useful to read the Histogram64 doc that is part of the SDK.

If your VS project is based off an SDK example, then right-click on the .cu file in VS’s Solution Explorer, go to Properties, then Custom Build Step. It’s where all the action is.