Does pinned memory can accessed by Device?

I was learing something about pinned memory and zero-copy memory.

But i found the pinned memory can access by device without cudamemcpy

GPU is A800, CUDA version is 12, My code as follows:

__global__ void sumArraysGPU(float*a)
{
  int i=blockIdx.x*blockDim.x+threadIdx.x;
  printf("%f\n", a[i]);
}


int main(int argc,char **argv)
{
  int dev = 0;
  cudaSetDevice(dev);

  int nElem=32;

  int nByte=sizeof(float)*nElem;

  float *a_h=(float*)malloc(nByte);
  float *a_d;  
  cudaMallocHost((float**)&a_d,nByte);

  for(int i=0; i<32; i++)
    a_h[i] = i;
  
  cudaError_t error = cudaMemcpy(a_d,a_h,nByte,cudaMemcpyHostToDevice);
  
  if(error == cudaSuccess){
    printf("Success\n");
  }

  sumArraysGPU<<<1,32>>>(a_d);


  free(a_h);
  cudaFreeHost(a_d);


  return 0;
}

There are some question about the code:

  1. In my understanding, pinned-memory located in CPU DRAM which different from zero-copy memory has tow address in CPU DRAM and GPU DRAM. Why the statement : cudaMemcpy(a_d,a_h,nByte,cudaMemcpyHostToDevice)
    not throw a error, because i tried to transfer data from CPU to CPU but used cudaMemcpyHostToDevice parameter.
  2. In kernel function, i tried to print data, I found the data in a_d the same as a_h, so it’s means data transfer is successful, but I havn’t But I didn’t declare a_d to zero-copy memory.

Is there something wrong in my understanding?

In my opinion, we shouldn’t draw much of a distinction between “pinned” and “zero-copy”. They are roughly the same thing.

Pinning memory is the act of placing it in a kind of storage that the GPU can access directly. Unit 7 of this online training series discusses this in more detail.

Zero copy means to access this data directly in GPU kernel code, as if you were accessing device memory. Both device allocations and pinned allocations reside in the logical global space, and both types of allocations are directly accessible from GPU kernel code, without requiring cudaMemcpy, per se.

Yes, correct.

No, not different. If you are going to use a “zero-copy technique” the data you do that with must be in pinned memory.

Pinned memory is accessible from device code, so you could say it is device-accessible, and it is a legitimate target for cudaMemcpy. I think what you will find is that you could use either cudaMemcpyHostToHost or cudaMemcpyHostToDevice. Either will work. because pinned memory can be accessed either from host or device.

In CUDA Runtime API documentation, they said must be set flag cudaHostAllocMapped in cudaHostAlloc to get mapped device memory for pinned memory.

cudaHostAllocMapped discription in documentation as follows:

cudaHostAllocMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer().

In my test code, i didn’t set flag above, why it can accessed by device?

Because in a 64-bit host OS, UVA is in effect, and in a UVA regime, all pinned allocations are automatically mapped, even if you don’t set the flag.

From here:

Automatic Mapping of Host Allocated Host Memory

All host memory allocated through all devices using cudaMallocHost() and cudaHostAlloc() is always directly accessible from all devices that support unified addressing. This is the case regardless of whether or not the flags cudaHostAllocPortable and cudaHostAllocMapped are specified.

The pointer value through which allocated host memory may be accessed in kernels on all devices that support unified addressing is the same as the pointer value through which that memory is accessed on the host. It is not necessary to call cudaHostGetDevicePointer() to get the device pointer for these allocations.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.