Computation on GPU differs from DevEmu

Hi there,

I’m currently developing some application with CUDA Beta2 on linux and ran into some problems. I get correct results from my kernels when executing the code in a -devemulation environment, however when executing the code on the GPU, the results differ. Also while compiling, nvcc spits the following warning at me for several lines:

My main kernel is defined like this

Later on, the kernel copies the content of buffer[idx] to threadbuffer, does some heave computation in threadbuffer (i fear modifying it in global memory due to latency) and finally copies threadbuffer back to buffer[idx].

I guess I lack a better understanding of how cuda allocates it’s memory and how it get’s referenced on the gpu. I pass a pointer to threadbuffer to several functions (e.g. device void foobar(gpu_inbuffer *ibuffer)) and get a warning as above for every line like this.

Could you help me out with some insight on that? I need frequent read and write access to members of buffer[idx] and I need to pass references to it to several device functions. My guess is that cuda allocates tbuffer in local memory and passes around wild pointers, dereferences for global memory.

Please provide a complete test app which reproduces this problem, along with build instructions, an nvidia-bug-report.log and the output from “nvcc -V”.


thanks for replying so fast. I’m currently not allowed to disclose any on the project’s real code so I can’t post it here. I’ll try to create a dummy-scenario which resembles the problem.

Can one give me some general insight on the Warning regarding dangling pointers mentioned above in the meantime?

That warning is usually mostly harmless, unless the memory you are dereferencing is actually in shared memory.

In your case, it could very well be the problem though, as you have tbuffer as a local array meaning it gets stuffed in local memory (which is an area of global memory, so you aren’t really saving any latency/bandwidth: shared memory may be a better option…). But as far as I know, local memory is addressed differently than global which is probably why the compiler assuming it can dereference global memory becomes a problem.

If this is the case, then a simple test case just needs to create a local array and do some pointer arithmetic to confuse the compiler. Dereferencing the pointer should then prompt the warning and incorrect code again.