emulation mode works but gpu mode fails

timothy · August 6, 2008, 10:45pm

I have a simple program that works in emulation mode:I am trying to use device function to change some data(*d_data) passed from host side.

float3* d_data=NULL;

CUDA_SAFE_CALL( cudaMalloc( (void**) &d_data, size));

kernel<<<gridSize, blockSize>>>(d_data);

It works well. But when I change into GPU mode, the d_data’s value hasn’t been changed after the device function call. Do I need to copy the results from device to host like this: ?

float3* h_odata=NULL;

CUDA_SAFE_CALL( cudaMalloc( (void**) &h_odata, size));

CUDA_SAFE_CALL( cudaMemcpy( h_odata, d_data, size, cudaMemcpyDeviceToHost) );

But this doesn’t work neither. :mad: Please help, thanks!

-timothy

Sarnath · August 7, 2008, 5:51am

I have a simple program that works in emulation mode:I am trying to use device function to change some data(*d_data) passed from host side.
float3* d_data=NULL;

CUDA_SAFE_CALL( cudaMalloc( (void**) &d_data, size));

kernel<<<gridSize, blockSize>>>(d_data);
It works well. But when I change into GPU mode, the d_data’s value hasn’t been changed after the device function call. Do I need to copy the results from device to host like this: ?
float3* h_odata=NULL;

CUDA_SAFE_CALL( cudaMalloc( (void**) &h_odata, size));

CUDA_SAFE_CALL( cudaMemcpy( h_odata, d_data, size, cudaMemcpyDeviceToHost) );
But this doesn’t work neither. :mad: Please help, thanks!

-timothy

[snapback]421799[/snapback]

You should copy from the device AFTER the kernel launch!

It owuld be safe to issue a “cudaThreadSynchronize()” after the kernel launch and then follow it up with a copy

pfccpp · August 7, 2008, 6:51am

Your second block of code:

You are allocating memory in h_odata with cudaMalloc → h_odata points to device memory

You are calling cudaMemcpy with h_odata as the first argument and cudaMemcpyDeviceToHost as the last one → CUDA thinks h_odata points to host memory and thats not correct in the example.

You must do the following:

float3* h_data;

float3* d_data;

h_data=(float3*)malloc(size); // Host allocation

cudaMalloc((void**)&d_data, size); // Device allocation

//Write the test data in h_data

...

cudaMemcpy(d_data, h_data, size, cudaMemcpyHostToDevice); // Upload the data

kernel<<<gridSize, blockSize>>>(d_data); // Compute the data

cudaMemcpy(h_data, d_data, size, cudaMemcpyDeviceToHost); // Download the data

VrahoK · August 7, 2008, 9:14am

Your second block of code:

You are allocating memory in h_odata with cudaMalloc → h_odata points to device memory

You are calling cudaMemcpy with h_odata as the first argument and cudaMemcpyDeviceToHost as the last one → CUDA thinks h_odata points to host memory and thats not correct in the example.

You must do the following:
float3* h_data;

float3* d_data;

h_data=(float3*)malloc(size); // Host allocation

cudaMalloc((void**)&d_data, size); // Device allocation

//Write the test data in h_data

...

cudaMemcpy(d_data, h_data, size, cudaMemcpyHostToDevice); // Upload the data

kernel<<<gridSize, blockSize>>>(d_data); // Compute the data

cudaMemcpy(h_data, d_data, size, cudaMemcpyDeviceToHost); // Download the data
[snapback]421918[/snapback]

correct, just one addition: if you want most performance you should use page locked host memory, allocated by:

float3* h_data;

cudaMallocHost((void**)&h_data, size); // Host allocation

timothy · August 7, 2008, 7:26pm

correct, just one addition: if you want most performance you should use page locked host memory, allocated by:
float3* h_data;

cudaMallocHost((void**)&h_data, size); // Host allocation
[snapback]421943[/snapback]

Thank you so much guys, it works! :laugh:

Topic		Replies	Views
copy object from host to device CUDA Programming and Performance	4	1960	August 18, 2010
MemCopy Problem with CUDA Can't copy data CUDA Programming and Performance	2	2673	January 10, 2008
cudaMemcpy only works in emulation mode. CUDA Programming and Performance	2	1729	March 25, 2010
Array copy cuda program copy array from Host to GPU CUDA Programming and Performance	2	3530	September 17, 2016
Strange Problem cudaMemcpy on GPU vs Device Emulation CUDA Programming and Performance	6	6141	September 5, 2009
strange behavior with device emulation CUDA Programming and Performance	5	2764	May 20, 2008
Question about CUDA_SAFE_CALL(cudaMemcpy(hostPx, CUDA_SAFE_CALL(cudaMemcpy(hostPx, device CUDA Programming and Performance	6	47560	January 23, 2009
cudaMemcpy() works in emu-mode; release-mode don't CUDA Programming and Performance	2	2336	May 24, 2007
about using "cudamemcpy" CUDA Programming and Performance	2	3281	June 5, 2008
cudaMemcpy synchronization problem CUDA Programming and Performance	3	7966	September 17, 2007

emulation mode works but gpu mode fails

Related topics