Strange Problem cudaMemcpy on GPU vs Device Emulation

jpfc · August 27, 2009, 6:41am

hello,

i have a small cuda program like this:

GLMmodel *_glm;
GLMmodel *v = (GLMmodel )a;//a is a function parameter, void. i’ve tried using it directly in cudaMemcpy, same result.
size_t size = sizeof(GLMmodel);

cudaError_t mem_alloc, mem_transfer;

mem_alloc = cudaMalloc((void**) &_glm, size);
std::cout << "mem_alloc: " << mem_alloc << std::endl;

mem_transfer = cudaMemcpy((void *)_glm, (void *)v, size, cudaMemcpyHostToDevice);
std::cout << "mem_transfer: " << mem_transfer << std::endl;

std::cout << "number of triangles: " << _glm->numtriangles << std::endl; //return 12 like the original in device emulation or a strange big value using the gpu

everythings return success but when i access that GLuint variable in *_glm it returns 1378016 or something when it should return 12.
i’ve checked the sizes, the source and the destiny values, the casts, memsets, cuda_get_last_error and such debug info(all success) and i don’t understand this difference.
i even tried the cuda visual profiler and it shows a completed memory copy and with the correct size…
the worst part is, if i use device emulation, the GLuint variable in *glm is correct and returns 12(the program behaves correctly even some lines later in a kernel call). this is very strange and this function(a very important one) working this way is a major problem in my development.
i’ve read two other threads but no one seems to find a solution :(.
i am working on a macmini with geforce 9400M, mac os x 10.5.8, last version of toolkit, driver, sdk and i am capable of running some sdk examples.
i’m sorry if i am doing a stupid error.

thank you very much.

PTThompson · August 27, 2009, 1:59pm

hello,

i have a small cuda program like this:

GLMmodel *_glm;

GLMmodel *v = (GLMmodel )a;//a is a function parameter, void. i’ve tried using it directly in cudaMemcpy, same result.

size_t size = sizeof(GLMmodel);

cudaError_t mem_alloc, mem_transfer;

mem_alloc = cudaMalloc((void**) &_glm, size);

std::cout << "mem_alloc: " << mem_alloc << std::endl;

mem_transfer = cudaMemcpy((void *)_glm, (void *)v, size, cudaMemcpyHostToDevice);

std::cout << "mem_transfer: " << mem_transfer << std::endl;

std::cout << "number of triangles: " << _glm->numtriangles << std::endl; //return 12 like the original in device emulation or a strange big value using the gpu

everythings return success but when i access that GLuint variable in *_glm it returns 1378016 or something when it should return 12.

i’ve checked the sizes, the source and the destiny values, the casts, memsets, cuda_get_last_error and such debug info(all success) and i don’t understand this difference.

i even tried the cuda visual profiler and it shows a completed memory copy and with the correct size…

the worst part is, if i use device emulation, the GLuint variable in *glm is correct and returns 12(the program behaves correctly even some lines later in a kernel call). this is very strange and this function(a very important one) working this way is a major problem in my development.

i’ve read two other threads but no one seems to find a solution :(.

i am working on a macmini with geforce 9400M, mac os x 10.5.8, last version of toolkit, driver, sdk and i am capable of running some sdk examples.

i’m sorry if i am doing a stupid error.

thank you very much.

two things: 1) I presume when you say that a is a function parameter, you mean it is passed from main into the function, correct? (as opposed to a function pointer, which can’t be used) 2) is the code compact enough that you can post it here, and is it portable? I have a couple machines I could try it out on to make sure that it’s actually a bug in the code, rather than a driver/api issue, if you think that would help.

jpfc · August 28, 2009, 5:18am

hello PTThompson,

1), yes you are correct, my english is another big bug :)

thank you very much for your help! thank you! i’ve attached a zip file containing the binaries/projects files from xcode 3.1.3/visual studio 2005. they both are configured with device emulation off. in that example, number of triangles should be 12 but without the emulation on it gives strange values like ‘12332423’ or ‘0’ on windows(with an exception). i tried compiling and running my code on windows and, without emulation on, there was a crash after cudaMemcpy when i try to read the value(number of triangles) from the copied pointer. visual studio debugger stopped immediately after the line where i print the number of triangles from the newly copied pointer and it showed me the symbols ??? all over the variables in pointer’s data type structure meaning that it was not initialized, probably…but the cudaMemcpy operation returned success. with device emulator on everything runs ok. i don’t understand…

once again, thank you.
BigBug_.zip (2.54 MB)

jpfc · August 29, 2009, 9:08am

hello there,

i tested this situation in a new project(in attachment) with a sample file i got from visual studio template but now the copy was from device to host and the problem persists.
this is indeed a very strange situation.
until there’s a new release or until i/someone find the problem and/or the solution, i’m developing in device emulation mode… :(

thank you for your time.
Test.zip (359 KB)

tmurray · August 29, 2009, 4:59pm

"std::cout << “number of triangles: " << _glm->numtriangles << std::endl; //return 12 like the original in device emulation or a strange big value using the gpu”

of course that doesn’t work, _glm is a variable that resides in GPU memory and therefore cannot be accessed transparently by the CPU.

jpfc · September 5, 2009, 3:46am

thank you for your time.

what a stupid mistake/lack of information.

i guess i’m still trying to enter in cuda world.

however, i think it’s a bit odd that device emulation mode lets you do things that cannot run in real mode. what’s the purpose of device emulation then? advanced cuda users?

thank you once again.

seibert · September 5, 2009, 4:36am

Device emulation mode is a quick-and-dirty feature which compiles the device code using the host compiler with some preprocessor tricks. It was intended as an easy way to debug the logic of your code using your favorite debugger or printf(). (Device emulation mode predates cuda-gdb, which is a better choice if available for your platform.) It makes no attempt to actually simulate a real CUDA device to discover “CUDA usage bugs,” since correctly simulating a device is non-trivial.

Fortunately, other people are tackling this problem. A pretty amazing program mention in the forum here is called Ocelot:

http://code.google.com/p/gpuocelot/

It really does try to emulate a CUDA device, and can detect a number of usage bugs. (It does lots of other cool stuff, too!)

Topic		Replies	Views
strange behavior with device emulation CUDA Programming and Performance	5	2692	May 20, 2008
Cuda code performance CUDA Programming and Performance	14	3089	December 16, 2014
Help on fixing some poor performances (rookie) CUDA Programming and Performance	10	7162	November 28, 2007
cudaMemcpyToSymbol returnes "invalid device symbol" CUDA Programming and Performance	12	35307	May 2, 2011
cudaMemcpy problem problem with using cudaMemcpy CUDA Programming and Performance	12	5888	January 5, 2010
Got out of memory from cudaMemcpy CUDA Programming and Performance	13	3863	January 28, 2022
Getting around apparent CUDA bugs CUDA Programming and Performance	5	963	September 20, 2011
cudaMemcpy() returns success and copy incorrect data CUDA Programming and Performance	3	2096	March 4, 2017
Potential Bug, cuda-memcheck can someone verify? Program crashing on GPU initialisation with cuda-me CUDA Programming and Performance	11	3460	April 24, 2020
problem with double precision unpredictable results Different run give differents errors or no error CUDA Programming and Performance	12	2783	September 10, 2010

Strange Problem cudaMemcpy on GPU vs Device Emulation

Related topics