I’m kind of a CUDA newbie, so please bear with me.
Recently, I implemented CUDA on some existing C code and got a substantial speed up. The code calls the same kernel several hundred times, and each time I have to send a large array back to the host. Since I keep sending the same data array back and forth, I thought from what I’ve read that using zero copy on that array would be a good idea.
So, I’m trying to use the simpleZeroCopy example from the SDK (which runs fine on my machine) as a guide, but when I try to run the code in my project, I get an error during the memory allocation. Here’s basically what I’m doing:
float *Drv, *Hrv;
cudaHostAlloc((void **)&Hrv, sizeof(float) * arraySize, cudaHostAllocMapped); //error occurs here
for (int i = 0; i < arraySize; i++)
Hrv[i] = 0;
cudaHostGetDevicePointer((void **)&Drv, (void *)Hrv, 0);
for (int i = 0; i < manyIterations; i++)
kernel<<<blocksPerGrid, threadsPerBlock>>>(Drv, someOtherStff);
//do some stuff on the rv array
The data size I’m working with for the zero copy is about 2048 * 512 floats, which is the same size as the data from the simpleZeroCopy example. Anybody see what I’m doing wrong? Thanks