strange global memory problem

I have a kernel that fails to persist global memory writes when compiled on the device. However, when I compile the application in emulator mode, memory writes persist outside of the kernel.

The code below takes all variables as global memory. It tries to assign a set of randomly generated 2D coordinates to a set of objects. Coordinates are constrained by an image mask, d_msk. Succesful writes are recorded with a “1” in the success array.


* Determine the starting coordinates for each object.


__global__ void

d_assignObjectPositions(uchar *d_msk, uint imageWidth,  uint imageHeight, float * randomNumsX, float * randomNumsY, uint4 *objects, uint *success, int N)


	unsigned int idx = blockIdx.x*blockDim.x + threadIdx.x;	

	unsigned int pos = 0;

	unsigned int randX = 0;

	unsigned int randY  = 0;

	uchar a,b,c;


	if(idx < N){


  // Assign position if no previous successful position assignments

  if( success[idx] == 0){


  	randX = (unsigned int)floor(randomNumsX[idx] * imageWidth);

  	randY = (unsigned int)floor(randomNumsY[idx] * imageHeight);


  	pos = (randX + (randY*imageWidth)) * 4;

 	a = d_msk[pos];

  	b = d_msk[pos+1];

  	c = d_msk[pos+2];


  	// Assign position if this location in the texture mask is open

  	if(a != 0x00 || b != 0x00 || c!= 0x00 ){

    objects[idx].x = randX;

    objects[idx].y = randY;

    success[idx] = 1;           // This write only persists in emulator





Some other notes:

Development platform: Ubuntu Linux 8.04, 32 bit.

Driver: 177.13

Cuda Tookit: version 2.0 beta 2 for Ubuntu 7.10

Cuda SDK: 2.0 beta 2

GPU: 8800 GT

Compiler: g++ 4.2.3

I’d greatly appreciate any suggestions. I’m a bit new to CUDA and GPGPU.


Well, figured it out. It seems I allocated the objects array as host memory using malloc instead of device memory. I guess this isn’t a very auspicious beginning to my CUDA experience.

On the bright side however, things can only get better from here :smile: