Strange Problem cudaMemcpy on GPU vs Device Emulation

hello,

i have a small cuda program like this:

GLMmodel *_glm;
GLMmodel *v = (GLMmodel )a;//a is a function parameter, void. i’ve tried using it directly in cudaMemcpy, same result.
size_t size = sizeof(GLMmodel);

cudaError_t mem_alloc, mem_transfer;

mem_alloc = cudaMalloc((void**) &_glm, size);
std::cout << "mem_alloc: " << mem_alloc << std::endl;

mem_transfer = cudaMemcpy((void *)_glm, (void *)v, size, cudaMemcpyHostToDevice);
std::cout << "mem_transfer: " << mem_transfer << std::endl;

std::cout << "number of triangles: " << _glm->numtriangles << std::endl; //return 12 like the original in device emulation or a strange big value using the gpu

everythings return success but when i access that GLuint variable in *_glm it returns 1378016 or something when it should return 12.
i’ve checked the sizes, the source and the destiny values, the casts, memsets, cuda_get_last_error and such debug info(all success) and i don’t understand this difference.
i even tried the cuda visual profiler and it shows a completed memory copy and with the correct size…
the worst part is, if i use device emulation, the GLuint variable in *glm is correct and returns 12(the program behaves correctly even some lines later in a kernel call). this is very strange and this function(a very important one) working this way is a major problem in my development.
i’ve read two other threads but no one seems to find a solution :(.
i am working on a macmini with geforce 9400M, mac os x 10.5.8, last version of toolkit, driver, sdk and i am capable of running some sdk examples.
i’m sorry if i am doing a stupid error.

thank you very much.

two things: 1) I presume when you say that a is a function parameter, you mean it is passed from main into the function, correct? (as opposed to a function pointer, which can’t be used) 2) is the code compact enough that you can post it here, and is it portable? I have a couple machines I could try it out on to make sure that it’s actually a bug in the code, rather than a driver/api issue, if you think that would help.

hello PTThompson,

1), yes you are correct, my english is another big bug :)

  1. thank you very much for your help! thank you! i’ve attached a zip file containing the binaries/projects files from xcode 3.1.3/visual studio 2005. they both are configured with device emulation off. in that example, number of triangles should be 12 but without the emulation on it gives strange values like ‘12332423’ or ‘0’ on windows(with an exception). i tried compiling and running my code on windows and, without emulation on, there was a crash after cudaMemcpy when i try to read the value(number of triangles) from the copied pointer. visual studio debugger stopped immediately after the line where i print the number of triangles from the newly copied pointer and it showed me the symbols ??? all over the variables in pointer’s data type structure meaning that it was not initialized, probably…but the cudaMemcpy operation returned success. with device emulator on everything runs ok. i don’t understand…

once again, thank you.
BigBug_.zip (2.54 MB)

hello there,

i tested this situation in a new project(in attachment) with a sample file i got from visual studio template but now the copy was from device to host and the problem persists.
this is indeed a very strange situation.
until there’s a new release or until i/someone find the problem and/or the solution, i’m developing in device emulation mode… :(

thank you for your time.
Test.zip (359 KB)

"std::cout << “number of triangles: " << _glm->numtriangles << std::endl; //return 12 like the original in device emulation or a strange big value using the gpu”

of course that doesn’t work, _glm is a variable that resides in GPU memory and therefore cannot be accessed transparently by the CPU.

thank you for your time.

what a stupid mistake/lack of information.

i guess i’m still trying to enter in cuda world.

however, i think it’s a bit odd that device emulation mode lets you do things that cannot run in real mode. what’s the purpose of device emulation then? advanced cuda users?

thank you once again.

Device emulation mode is a quick-and-dirty feature which compiles the device code using the host compiler with some preprocessor tricks. It was intended as an easy way to debug the logic of your code using your favorite debugger or printf(). (Device emulation mode predates cuda-gdb, which is a better choice if available for your platform.) It makes no attempt to actually simulate a real CUDA device to discover “CUDA usage bugs,” since correctly simulating a device is non-trivial.

Fortunately, other people are tackling this problem. A pretty amazing program mention in the forum here is called Ocelot:

http://code.google.com/p/gpuocelot/

It really does try to emulate a CUDA device, and can detect a number of usage bugs. (It does lots of other cool stuff, too!)