Trouble with Histogram Function Memory access

Imran_Khan · September 24, 2007, 8:37pm

I am having trouble with a simple function that is trying to perform a histogram of an image. If I run this function in EmuDebug mode then all is well. But if I run in Debug mode it exits with the following errors

First-chance exception at 0x7c812a5b in cudaUrban.exe: Microsoft C++ exception: cudaError_enum @ 0x0012ca74.
First-chance exception at 0x7c812a5b in cudaUrban.exe: Microsoft C++ exception: cudaError_enum @ 0x0012ca74.
The thread ‘Win32 Thread’ (0xab8) has exited with code 1 (0x1).
The thread ‘Win32 Thread’ (0xd40) has exited with code 1 (0x1).
The thread ‘Win32 Thread’ (0x228) has exited with code 1 (0x1).
The thread ‘Win32 Thread’ (0xdc8) has exited with code 1 (0x1).
First-chance exception at 0x7c812a5b in cudaUrban.exe: Microsoft C++ exception: cudaError_enum @ 0x0012c77c.

The function is as follows

global void
kernel_MakeHistogram( kz_pixel_t* pImage,
unsigned int xIndex,
unsigned int yIndex,
unsigned int uiXRes,
unsigned int uiSizeX,
unsigned int uiSizeY,
kz_pixel_t* pLookupTable,
unsigned long threadHistogram)
/ This function classifies the greylevels present in the array image into

a greylevel histogram. The pLookupTable specifies the relationship
between the greyvalue of the pixel (typically between 0 and 4095) and
the corresponding bin in the histogram .
*/
{
const unsigned int ty = threadIdx.y;
const unsigned int tx = threadIdx.x;

unsigned int x = xIndexuiSizeX + blockIdx.xblockDim.x + tx;
unsigned int y = (yIndexuiSizeY + blockIdx.yblockDim.y + ty)*uiXRes;
unsigned int gmemPos = x + y;

kz_pixel_t tVal = (kz_pixel_t) pImage[gmemPos];
if(tVal > 4095) tVal = 4095;
const kz_pixel_t tempVal = pLookupTable[tVal];
const unsigned int lutIndex= (unsigned int) (ty+1)*tempVal;
threadHistogram[lutIndex]++;

}

The problem seems to be in the last line

threadHistogram[lutIndex]++;

If I put a fixed value as index then all works well. I have checked to see that I
am not indexing out of range.

I was wondering maybe the error code may shed some light on the problem.

Are these error codes explained in any document?

Thanks,

Imran

asadafag · September 25, 2007, 2:44am

Are you using CUDA in more than 1 host threads? This is infeasible due to host threads having different memory spaces.

Imran_Khan · September 25, 2007, 12:36pm

I am not using separate host thread but I have allocated enough memory so that my 16 threads can each work on their portion. But your comment seems to suggest that I cannot share threadHistogram memory array between threads. What do you propose that I do?

asadafag · September 25, 2007, 1:24pm

One host thread doing all CUDA stuff and message passing.

S.Warris · September 25, 2007, 1:24pm

You have threads writing to the same memory location. This can give problems, even if you have version 0.9 and up (atomic operations). These operations are only available for 32 bits ints.

Imran_Khan · September 25, 2007, 2:32pm

Thanks for the response. I would like to try atomic functions. But when I implement

the problem line using atomicAdd

atomicAdd(&threadHistogram[lutIndex],1);

The compiler says undefined function. I added -arch sm_11 to the command line but that did not help either. I am using windows XP and 8800 GTX with latest sdk.

MisterAnderson42 · September 25, 2007, 3:03pm

The 8800 GTX is only a compute 1.0 device, it cannot perform atomic adds. But, the algorithmic problem with your kernel is not what is causing your memory errors. As others have stated, it is because you are accessing GPU memory from a different thread than allocated it.

hakuna · September 26, 2007, 7:59am

I meet the same problem, and can you tell me how to add -arch sm_11 to the command line? I find the definition file of atomicAdd which is sm_11_atomic_functions.h, and I include it in the program, but that did not get any help. why this accurs?

Imran_Khan · September 28, 2007, 3:01pm

I am using Visual Studio 2003 and I just added it to the comand line. Atomic functions I am told are only supported on arch 1.1. Since I am using 8800 GTx this operation is not supported. Also as suggested my problem can be solved differently. If you are wanting to perform histogram it would be useful to read the Histogram64 doc that is part of the SDK.

alex_dubinsky · September 28, 2007, 6:11pm

If your VS project is based off an SDK example, then right-click on the .cu file in VS’s Solution Explorer, go to Properties, then Custom Build Step. It’s where all the action is.

Topic		Replies	Views
atomic functions CUDA Programming and Performance	17	14309	April 10, 2011
Histogram implementation CUDA Programming and Performance	12	2349	November 27, 2014
CUDA SDK samples throw errors on application exit CUDA Programming and Performance	2	9292	May 20, 2008
Worse atomic performance in shared than global memory CUDA Programming and Performance	7	8962	August 3, 2017
Necessary includes for atomicAdd? CUDA Programming and Performance	10	25762	June 16, 2021
Shared memory write conflicts Looking for a little help... CUDA Programming and Performance	5	4913	September 7, 2007
can you give me sample code for atomicAdd()? CUDA Programming and Performance	9	48507	June 5, 2009
atomic operations atomicExch is undefined CUDA Programming and Performance	8	17410	November 22, 2008
Newbish shared memory problem CUDA Programming and Performance	3	4290	January 25, 2008
CUDA Error: Invalid Device Function Debugging CUDA errors CUDA Programming and Performance	3	5762	July 29, 2009

Trouble with Histogram Function Memory access

Related topics