cudaHostGetDevicePointer() and Zero-Copy

pvonkaenel · April 30, 2009, 1:50pm

Hi there,

I’ve just started working with the 2.2 beta toolkit, and have a question about the new zero-copy access. How can I detect if it’s available for the current device or not at run-time so that I can either use it, or back off to the more traditional MemcopyAsync? I’ve tried performing a cudaHostAlloc() followed by a cudaHostGetDevicePointer(), but both always return cudaSuccess.

Thanks,
Peter

nwilt · May 1, 2009, 1:06am

Check the canMapHostMemory member of the cudaDeviceProp structure - it will be nonzero if the device can map pinned system memory.

Driver API apps may call cuDeviceGetAttribute with CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY.

pvonkaenel · May 1, 2009, 10:59am

I’m afraid that the 32-bit 2.2 beta toolkit version of cudaDeviceProp does not have a canMapHostMemory member (or anything close to it). Is this something in the 64-bit toolkit? Also, the CUdevice_attribute type does not have a CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY member. Is this a feature that will be coming out with the final version of the toolkit? If so I can wait for it.

Finally, is the zero-copy feature pinned to a specific CUDA device version like 1.3? I can also check the version, but I did not see it in the features list.

Thanks,

Peter

MisterAnderson42 · May 1, 2009, 11:50am

Very weird. This flag is definitely in the linux 64-bit toolkit: line 268 in cuda/include/driver_types.h. I haven’t actually tried if it works, though.

On normal discrete GPUs, only G200 supports zero-copy. Some of the integrated chips do support it, too: see http://forums.nvidia.com/index.php?showtopic=92290

pvonkaenel · May 1, 2009, 6:04pm

In the 32-bit Windows driver_types.h, line 268 is in the __cudaReserved[38] line:

/**

CUDA device properties

*/

/DEVICE_BUILTIN/

struct cudaDeviceProp

{

char name[256]; ///< ASCII string identifying device

size_t totalGlobalMem; ///< Global memory available on device in bytes

size_t sharedMemPerBlock; ///< Shared memory available per block in bytes

int regsPerBlock; ///< 32-bit registers available per block

int warpSize; ///< Warp size in threads

size_t memPitch; ///< Maximum pitch in bytes allowed by memory copies

int maxThreadsPerBlock; ///< Maximum number of threads per block

int maxThreadsDim[3]; ///< Maximum size of each dimension of a block

int maxGridSize[3]; ///< Maximum size of each dimension of a grid

int clockRate; ///< Clock frequency in kilohertz

size_t totalConstMem; ///< Constant memory available on device in bytes

int major; ///< Major compute capability

int minor; ///< Minor compute capability

size_t textureAlignment; ///< Alignment requirement for textures

int deviceOverlap; ///< Device can concurrently copy memory and execute a kernel

int multiProcessorCount; ///< Number of multiprocessors on device

int kernelExecTimeoutEnabled; ///< Specified whether there is a run time limit on kernels

int integrated; ///< Device is integrated as opposed to discrete

int __cudaReserved[38];

};

MisterAnderson42 · May 1, 2009, 7:08pm

Hopefully, NVIDIA has fixed the problem in 2.2 final. If not, hopefully Tim is reading this and double checks it to make sure before release :)

mfatica · May 1, 2009, 8:05pm

Yes, it is fixed in final.

tmurray · May 1, 2009, 8:10pm

In 2.2 beta, you only have the integrated device property–e.g., “is this MCP79 and can therefore do copy elimination as part of zero-copy.” (see the big zero-copy thread in the other forum if copy elimination doesn’t mean anything to you)

We noticed the glaring oversight too late for 2.2 beta, but in final there is also canMapHostMemory, which will be true on MCP79 + Compute 1.2 or greater.

Topic		Replies	Views
Cuda 2.2 / Zero-copy access CUDA Programming and Performance	33	42259	May 1, 2009
CUDA Zero Copy On TX1 Jetson TX1	20	6824	October 18, 2021
About cudaHostGetDevicePointer problem CUDA Programming and Performance	1	8946	March 30, 2009
Does pinned memory can accessed by Device? CUDA Programming and Performance	4	1058	March 18, 2024
does anybody have experience on cudaHostRegister zero copy memory CUDA Programming and Performance	8	14447	May 21, 2011
Wished CUDA 2.2 features! CUDA Programming and Performance	17	14174	May 26, 2009
Could someone compile simple example for me on the mobile card? CUDA Programming and Performance	20	10175	November 11, 2009
cudaMallocHost confusion CUDA Programming and Performance	6	9794	June 24, 2011
FAO: Nvidia Engineers:- Memory Leak in cudaMemcpyAsync Only occurs on Host To Device memory transfer CUDA Programming and Performance	4	5870	August 18, 2010
Why doesn't overlapping data transfers and kernel execution work here? CUDA Programming and Performance	60	159	February 25, 2025

cudaHostGetDevicePointer() and Zero-Copy

Related topics