I have some doubts regarding the following CUDA device properties:
cudaDevAttrUnifiedAddressing
canUseHostPointerForRegisteredMem
cudaDevAttrHostRegisterSupported
cudaDevAttrUnifiedAddressing: indicates if device support UVA (Unified Virtual Address): this mean that GPU and CPU memory pointers are “seamless” among host and device
canUseHostPointerForRegisteredMem: from CUDA Runtime API
1, if the device can access host registered memory at the same virtual address as the CPU, 0 otherwise
Is this param strictly connected to pointer registered via host function: __host__cudaError_t cudaHostRegister (void *ptr, size_t size, unsigned int flags)
or is it valid for pinned allocated area via host function: __host__cudaError_t cudaHostAlloc (void **pHost, size_t size, unsigned int flags)
For example, supposing having the following case scenario (I don't know if it's possible...):
once called cudaHostAlloc(...), thanks to UVA support it seems not strictly necessary to call cudaHostGetDevicePointer() in order to retrieve host pinned pointer to use in a CUDA kernel.
However, canUseHostPointerForRegisteredMem = 0 indicate that is mandatory to use it, unless that canUseHostPointerForRegisteredMem refers only to pointers registered using cudaHostRegister(...) function.
All 64-bit OS implementations of CUDA are automatically a UVA regime.
In a UVA regime with one exception, any memory allocated using cudaHostAlloc or cudaMallocHost, automatically has the property that the device pointer and the host pointer to access that allocation are numerically the same and interchangeable.
allocations via cudaHostRegisterdo not automatically have this property.
for a pinned host allocation in a UVA regime, you don’t need to look at device properties to answer this question. We have established that via the bullet points above, extracted from the doc section I linked.
I would say so, because we don’t need to look at device properties to answer the question about whether the pointer is interchangeable for host or device usage
correct, with the one exception called out in the section I linked
Right, we have now established in a couple different ways that this could only apply to memory obtained via cudaHostRegister.
So to sumup:
in UVA context (since Kepler generation + 64 bit process, i.e. cudaDevAttrUnifiedAddressing=1), host/device pointers share the same virtual addressing space and, if allocated using cudaHostAlloc or cudaMallocHost, they’re accessible from both device and host.
The only exception is for host pointer registered via cudaHostRegister(...).
In this case, two different scenario exists:
canUseHostPointerForRegisteredMem = 1, same as above.
canUseHostPointerForRegisteredMem = 0, I have to call cudaHostGetDevicePointer(...) in order to use pointer data on device.