Zero-copy from different threads

calin · May 8, 2009, 4:43pm

Hi all,

I have a multi-GPU application that uses a thread pool and a memory pool. Each of the threads controls kernels on one of the GPUs, and the memory comes from common pool. The application works fine if I don’t try to do zero-copy.

If I replace the memory pool with one allocated using cudaHostAlloc, and if I replace cudaMemcopy, and cudaMalloc with cudaGetDevicePointer, cudaHostGetDevicePointer fails with invalid argument (cudaErrorInvalidValue to be precise).

Before calling cudaHostAlloc, I call cudaSetDevice and cudaSetDeviceFlags(cudaDeviceMapHost).
cudaHostAlloc was invoked with cudaHostAllocMapped | cudaHostAllocPortable

I wonder if zero-copy works in this situation, or I have to create a memory pool per thread.

I am using 2 X Tesla C1060 and one Quadro FX 370. OS: openSUSE 11.0, kernel 2.6.25.20-0.1, nvidia driver 185.18.08-beta, cuda toolkit 2.2

thank you
calin

E.D_Riedijk · May 9, 2009, 7:51am

Hi all,

I have a multi-GPU application that uses a thread pool and a memory pool. Each of the threads controls kernels on one of the GPUs, and the memory comes from common pool. The application works fine if I don’t try to do zero-copy.

If I replace the memory pool with one allocated using cudaHostAlloc, and if I replace cudaMemcopy, and cudaMalloc with cudaGetDevicePointer, cudaHostGetDevicePointer fails with invalid argument (cudaErrorInvalidValue to be precise).

Before calling cudaHostAlloc, I call cudaSetDevice and cudaSetDeviceFlags(cudaDeviceMapHost).

cudaHostAlloc was invoked with cudaHostAllocMapped | cudaHostAllocPortable

I wonder if zero-copy works in this situation, or I have to create a memory pool per thread.

I am using 2 X Tesla C1060 and one Quadro FX 370. OS: openSUSE 11.0, kernel 2.6.25.20-0.1, nvidia driver 185.18.08-beta, cuda toolkit 2.2

thank you

calin

Very interesting as we will be making a multi-GPU app like this in the near future. A memory region on the host that is read-only and should be accessed by all GPUs. And N regions that are read-write and allocated to one GPU.

nwilt · May 13, 2009, 1:07pm

I don’t think the Quadro FX 370 supports zero-copy - check the canMapHostMemory member of the device properties structure.

cudaHostGetDevicePointer() should succeed for the C1060’s. cudaHostGetDevicePointer() is the failure point because portable allocations may map the host memory into some devices’ address spaces and not in others.

Topic		Replies	Views
problem with zero-copy write to write-combined memory. CUDA Programming and Performance	2	727	September 20, 2014
CUDA Zero Copy On TX1 Jetson TX1	20	6827	October 18, 2021
how to make zero copy work CUDA Programming and Performance	9	9702	September 22, 2009
How to pass two flags to cudaHostAlloc()? CUDA Programming and Performance	5	9236	June 17, 2009
cudaErrorInvalidValue when copying from host to device CUDA Programming and Performance	1	9819	November 24, 2009
cudaHostGetDevicePointer() and Zero-Copy CUDA Programming and Performance	7	9940	May 1, 2009
Questions for multiple GPUs CUDA Programming and Performance	8	7166	April 20, 2009
cudaHostAlloc not working cudaHostAlloc returning CUDA_ERROR_INVALID_IMAGE CUDA Programming and Performance	2	2094	January 2, 2011
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5446	June 26, 2007
Multiple GPU memory address problem help CUDA Programming and Performance	6	7775	November 17, 2009

Zero-copy from different threads

Related topics