Working in emulation but not in device mode

Fede1 · December 30, 2008, 4:17pm

Hi,

I’m developing an image processing application under CUDA.
The program works correctly in device emulation mode (make emu=1, with or without dbg=1), producing the correct resulting image, while running on the device, the resulting image contains a “NaN” in every pixel.
This happens independently of the number of threads (I tried from 1 to 128).

Do you have any idea about what problem it could be?

I work with a GeForce 9800 GTX under Fedora 9. The host is an AMD 64 3200+.

Thank you in advance.
Federico

E.D_Riedijk · December 30, 2008, 5:26pm

There are a LOT of possibilities, do you check for errors after your kernel call? (CUT_CHECK_ERROR macro, use dbg=1)

Fede1 · December 30, 2008, 5:45pm

Yes, I do, and it is always OK.

And always get “cudaSuccess” as return value after: cudaMallocPitch, cudaMemset, cudaMemcpy2D.

Thank you.

E.D_Riedijk · December 30, 2008, 6:40pm

well, then I would at first try to write a constant value to your pixels, then a value dependend on threadIdx. If that is all ok, you probably have an error in your calculation code.

Fede1 · January 8, 2009, 9:59am

After some tests, the problem seems not to be in the computation, but still in the initialization of the data.

Now I have just a program allocating matrices and setting them to 0 (float), with the following code:

...

CUDA_SAFE_CALL( CudaErr = cudaMallocPitch( (void **)&VolPtr, &VolPitch, NVoxelsZ*sizeof(float), NVoxelsY*NVoxelsX) );

CUDA_SAFE_CALL( CudaErr = cudaMemset2D(VolPtr, VolPitch, (float)0.0, NVoxelsZ, NVoxelsX*NVoxelsY) );

float *TransVol;

TransVol = (float *) malloc(NVoxelsX*NVoxelsY*VolPitch);

CUDA_SAFE_CALL( CudaErr = cudaMemcpy( (void*)TransVol, (void*)VolPtr, NVoxelsX*NVoxelsY*VolPitch, cudaMemcpyDeviceToHost) );

...

In emulation mode, TransVol contains only 0.0’s, as expected. In device mode (release or debug) it contains only nan’s. I have really no idea, why it happens!

At the moment, NVoxelsX = NVoxelsY =500, NVoxelsZ =1. The returned value for VolPitch is 64.

Fede1 · January 8, 2009, 10:17am

I solved the NaN problem!

It was a wrong use of cudaMemCpy call. By correcting the last line as follows:

CUDA_SAFE_CALL( CudaErr = cudaMemcpy2D( TransVol, VolPitch, VolPtr, VolPitch, NVoxelsZ, NVoxelsX*NVoxelsY, cudaMemcpyDeviceToHost) );

it works.

Topic		Replies	Views
nan value in array CUDA Programming and Performance	5	6320	April 18, 2016
strange behavior with device emulation CUDA Programming and Performance	5	2743	May 20, 2008
Different Outputs with -deviceemu mode CUDA Programming and Performance	3	4751	April 8, 2009
Undefined and NaN results CUDA Programming and Performance	5	26104	February 8, 2011
Code works under emulation, but fails on the device CUDA Programming and Performance	3	2214	July 30, 2009
CUDA 2.1 Beta Problem/Bugs (Linux) CUDA Programming and Performance	5	1699	January 6, 2009
Working emulation program but failing gpu program How to do a bug search when the emulation runs fin CUDA Programming and Performance	4	2716	December 8, 2008
This is driving me nuts! memory access problem.. CUDA Programming and Performance	5	2701	December 7, 2007
Emulation works, Debug doesn't CUDA Programming and Performance	12	2736	January 29, 2010
device code not executing? CUDA Programming and Performance	4	3869	July 10, 2008

Working in emulation but not in device mode

Related topics