I’m currently developing some application with CUDA Beta2 on linux and ran into some problems. I get correct results from my kernels when executing the code in a -devemulation environment, however when executing the code on the GPU, the results differ. Also while compiling, nvcc spits the following warning at me for several lines:
My main kernel is defined like this
Later on, the kernel copies the content of buffer[idx] to threadbuffer, does some heave computation in threadbuffer (i fear modifying it in global memory due to latency) and finally copies threadbuffer back to buffer[idx].
I guess I lack a better understanding of how cuda allocates it’s memory and how it get’s referenced on the gpu. I pass a pointer to threadbuffer to several functions (e.g. device void foobar(gpu_inbuffer *ibuffer)) and get a warning as above for every line like this.
Could you help me out with some insight on that? I need frequent read and write access to members of buffer[idx] and I need to pass references to it to several device functions. My guess is that cuda allocates tbuffer in local memory and passes around wild pointers, dereferences for global memory.