beginner with boundary problems

I am doing a simple application of cuda in my program but seem to be having problems at the boundary between my cuda functions and my applications regular side. On my application side (a C++ application in Visual Studio 2008 compiled by its C++ compiler) I have an image buffer (X x Y two bytes deep that I pass as an unsigned short array of size X x Y) that I am passing to a function in a .cu function ( compiled by the nvcc compiler). On the .cu side function I just set the image buffer values to 0 (I don’t copy it to device memory or anything but just leave it in host memory) and then give them back to my applications side and pass it back up to the image viewer to see what I got and the 0’s are now only in about the first quarter of the image buffer, the rest is not touched. Fine so my scaling (like maybe it thinks the array is an array of 8 byte doubles are something when it gets there) is off, I think. Well, if before I pass the image array into the .cu function I loop thru the image array and just touch each element, then when I pass it off to the .cu function and get it back and display it, the whole image is 0’s, as I would expect. Does anybody know if there are any weird things going on with how the nvcc compiler and the visual studio c++ compiler handle things that might cause this?

Shouldn’t be any problem. nvcc see’s the memory same as visual studio. This is either a casting issue (working with the wrong size parameter), wrong loop limits or you’re allocating the buffer using ipp or something like that that uses pitched memory and you’re writing into the boundaries.

You can try posting the buffer definition, allocation, the loop that works in visual and the loop that doesn’t work in nvcc. It will give a bit more information to go on

Here is the part of my code that calls my testRun function in my .cu code. The pixbuffer is an unsigned char array two bytes wide as I am

showing here commented out because it was allocated somewhere else.

// pixels_ = new unsigned char [xSize * ySize * pixDepth];

But the really interesting part is, (with this loop that just touches the memory commented out) when I am in debug and just return from the function call of runTest, all of the device memory is set to 0’s as you would expect. But by the time it gets back to the image viewer the last 3/4 of the memory has been screwed up. But if I touch all the memory before I make the function call then it makes it all the way back to the image viewer as 0’s as I would expect. The image viewer is a program called ImageJ (Open source java from the NIH and this C++ code is part of an open source program called Micro Manager that wraps the C++ into a Java Jar with ACE wrappers that can then be called from ImageJ. There is a lot of stuff for this memory as it goes from here thru the ACE wrappers to the Java program ImageJ that I will have to check as well. But it is interesting that if I just touch the memory it gets thru, but if I don’t then it gets screwed up.

void* pixBuffer = const_cast<unsigned char*> (img_.GetPixels());

RETURN_DEVICE_ERR_IF_CAM_ERROR(pl_exp_start_seq(hPVCAM_, pixBuffer));



	for( int i = 0; i < (img_.Height()); ++i) 


		 for( int j = 0; j < (img_.Width()); ++j) 


			 ((unsigned short*)pixBuffer)[i*img_.Width() + j];  // this is where I touch the memory



	runTest((unsigned short*)pixBuffer);

That part was in the normal C++ code runTest is in the .cu file.

void runTest(unsigned short* idata)


// size of the matrix

const unsigned int size_x = 1392;

const unsigned int size_y = 1040;

// size of memory required to store the matrix

const unsigned int mem_size = 2 * size_x * size_y;

 cudaSetDevice( cutGetMaxGflopsDeviceId() );

unsigned short* h_idata = (unsigned short*) idata;

for(int i = 0; i < (size_y); ++i)


	 for(int j = 0; j < (size_x); ++j) 


		  ((unsigned short*)h_idata)[i*size_x + j] = 0;





You’ve got all sorts of unnecessary casts and questionable casts. Instead of transferring the image size you define it in the function, did you make sure that it matches?

In any case I doubt that any of these are the culprits. It sounds like a timing issue, my guess is that

RETURN_DEVICE_ERR_IF_CAM_ERROR(pl_exp_start_seq(hPVCAM_, pixBuffer))

is an asynchronous call, What happens if you comment this line out?

Another option is that img_.GetPixels() is filled out asynchronously filled somewhere. Another option is to allocate the data yourself and see what happens.


Thanks a lot, you were quite right. The program is doing a camera exposure and was not waiting on the right status information to determine when it was done so we were trying to work with the image while it was still being written to. So thanks again.