how to make zero copy work

cudarooster · September 21, 2009, 6:25pm

Hi,

I’m kind of a CUDA newbie, so please bear with me.
Recently, I implemented CUDA on some existing C code and got a substantial speed up. The code calls the same kernel several hundred times, and each time I have to send a large array back to the host. Since I keep sending the same data array back and forth, I thought from what I’ve read that using zero copy on that array would be a good idea.

So, I’m trying to use the simpleZeroCopy example from the SDK (which runs fine on my machine) as a guide, but when I try to run the code in my project, I get an error during the memory allocation. Here’s basically what I’m doing:

float *Drv, *Hrv;
cudaSetDeviceFlags(cudaDeviceMapHost);
cudaHostAlloc((void **)&Hrv, sizeof(float) * arraySize, cudaHostAllocMapped); //error occurs here

for (int i = 0; i < arraySize; i++)
Hrv[i] = 0;

cudaHostGetDevicePointer((void **)&Drv, (void *)Hrv, 0);

for (int i = 0; i < manyIterations; i++)
{
kernel<<<blocksPerGrid, threadsPerBlock>>>(Drv, someOtherStff);
//do some stuff on the rv array
}

The data size I’m working with for the zero copy is about 2048 * 512 floats, which is the same size as the data from the simpleZeroCopy example. Anybody see what I’m doing wrong? Thanks

tmurray · September 21, 2009, 7:38pm

What error do you actually get, and what GPU and OS are you using?

cudarooster · September 21, 2009, 8:17pm

I’m running a GTX 285 on Windows XP.

cudaGetLastError() just returns “unknown error”

tmurray · September 21, 2009, 8:18pm

what driver and toolkit are you running?

cudarooster · September 21, 2009, 8:40pm

Toolkit version 2.3, driver 190.38

cudarooster · September 22, 2009, 6:19pm

Anyone have any ideas on this? Am I going about it the right way?

tmurray · September 22, 2009, 6:46pm

Is that cudaHostAlloc the first API call you make that isn’t related to device enumeration or selection? (aka do you have a context when you call cudaSetDeviceFlags)

cudarooster · September 22, 2009, 8:35pm

I’ve been a little confused when I’ve read about the idea of contexts on the forums before. Here are all the CUDA-related calls that I’ve made up to the point of the error:

[codebox]

    cudaEvent_t start_event, stop_event;					

    cudaEventCreate(&start_event);

    cudaEventCreate(&stop_event);

float timer;

cudaEventRecord(start_event, 0);

char *device = NULL;

unsigned int flags;

cudaDeviceProp deviceProp;

int idev=0, deviceCount;

cudaSetDevice(idev);

cudaGetDeviceProperties(&deviceProp, idev);

if(!deviceProp.canMapHostMemory)

	fprintf(stderr, "Device %d cannot map host memory!\n", idev);

cudaSetDeviceFlags(cudaDeviceMapHost);

//declare device variables

float *Dx, *Ddata, *Drv, *Dx1, *Dx2, *Dshifts, *Hrv;

flags = cudaHostAllocMapped;

cudaHostAlloc((void **)&Hrv, sizeof(float) * xElem * shiftsSize, flags);

cudaMalloc((void**)&Dx, sizeof(float) * xElem);

cudaMalloc((void**)&Ddata, sizeof(float) * xElem * yElem);

checkCUDAError("memory allocation");

…

[/codebox]

Thanks, tmurray for your interest in helping me out so far.

tmurray · September 22, 2009, 8:47pm

cudaEventRecord will create a context before you set the device flags. It shouldn’t be returning unknown error, but I guess that’s a test hole we have. You need to set the device flags and the device before any other CUDA calls.

cudarooster · September 22, 2009, 9:01pm

Hey, great! I disabled the timer and it’s working now. Thanks for the help

Topic		Replies	Views
Can't copy device memory to host memory CUDA Programming and Performance	2	3099	June 10, 2009
CUDA Zero Copy On TX1 Jetson TX1	20	6824	October 18, 2021
Texture Cache Startup Issue Simple Texture Cache Starter example CUDA Programming and Performance	8	4016	March 17, 2010
Device to host data copy may not reflect on host side using graphs CUDA Programming and Performance	5	235	September 6, 2023
Problem using zero-copy / mapped memory Cuda 2.2 beta CUDA Programming and Performance	5	13436	March 19, 2009
Weird error CUDA Programming and Performance	5	2613	August 27, 2007
trying to copy array to Device -> Device -> Host CUDA Programming and Performance	6	4936	May 22, 2009
"unspecified launch failure" - ERROR CUDA Programming and Performance	9	14311	July 19, 2011
problem with zero-copy write to write-combined memory. CUDA Programming and Performance	2	727	September 20, 2014
Copying 2D array from host to device CUDA Programming and Performance	7	7225	July 27, 2010

how to make zero copy work

Related topics