When compiling the attached CUDA code for: G80 GTS 640 MB, AMD Opteron, 64bit Linux,
the second kernel parameter seems to get damaged.
What the code should do:
- allocate memory on the device
- launch the kernel with an int parameter (a dummy to trigger the error) and the device pointer to the memory allocated
- set the value of the memory the second parameter points to (666 in this case)
- copy the device memory back to the host
However the number assigned to the device memory (666) isn’t returned, some garbage is returned.
When compiling for emulation and printf-ing the address of the pointer parameter from the global function it doesn’t have the same value as stored in it on the host.
If we are pushing a “long” value to the kernel as the first parameter (without changing the kernel signature to “long” but keeping it as an “int”) everything is working.
Could someone please check if he could duplicate the problem on his machine? Or is there an error in my code?
I would be very glad for any help or pointers how to correct the problem.
cuda_64bit_linux_problem.zip (1.15 KB)