Accessing specific numa node with cudaMallocManaged

hiwot · September 20, 2021, 3:38pm

I was trying to allocate memory using cudaMallocManaged and I want it to be in a specific numa node. Is there a way to do that? I am trying to access pmem (byte addressable Intel optane memory) using UVM. The pmem is exposed in the system as system memory in its own numa node but it seems like it is not possible to use this memory using UVM. Is that possible to do?

AKravets · September 20, 2021, 3:49pm

Hi @hiwot,
Your question might be better suited for CUDA Programming and Performance - NVIDIA Developer Forums forum branch. I have moved your post there.

Robert_Crovella · September 21, 2021, 3:30am

there is no direct control over this (i.e. via the CUDA runtime API).

The way to control process memory placement on linux is via numactl

I won’t be able to give you a recipe for what you are describing.

hiwot · September 21, 2021, 3:36am

Using numactl did not help with allocating memory from pmem. Is it not possible to access pmem memory with CUDA API?

hiwot · September 28, 2021, 4:03pm

for example if we use cudaMallocManaged for the following simple example code

__global__ void add(size_t n, float *x, float *y1, float *y2)
{
  int index = blockIdx.x * blockDim.x + threadIdx.x;
  int stride = blockDim.x * gridDim.x;
  for (int i = index; i < n; i += stride)
    x[i] = y1[i] + y2[i];
}

int main(){

  size_t N = 10000000000;

  float *x, *y1, *y2;

  cudaMallocManaged(&x,N*sizeof(float));
  cudaMallocManaged(&y1,N*sizeof(float));
  cudaMallocManaged(&y2,N*sizeof(float));


  for (int i = 0; i < N; i++) {
    x[i] = 0;
    y1[i] = 1.0f;
    y2[i] = 2.0f;
  }


  int blockSize = 256;
  size_t numBlocks = (N + blockSize - 1) / blockSize;
  add<<<numBlocks, blockSize>>>(N, x, y1, y2);

  cudaDeviceSynchronize();

    cudaFree(x);
    cudaFree(y1);
    cudaFree(y2);
    return 0;
}

and run it with numactl --membind= the memory allocated with cudaMallocManaged won’t utilize the pmem node, is it possible to use pmem in this scenario?

njuffa · September 28, 2021, 10:06pm

This is really an operating system question, so a Linux forum is likely a better venue for your question.

As far as I know (and you should be able to confirm with a logging utility like strace) cudaMallocManaged is just a wrapper around a bunch of system calls, mostly mmap.

Searching around the internet, I find various discussions how pmem can be use in lieu of regular DRAM, how to select pmem with numactl, etc but best I can tell this seems to be work in progress. I don’t see any confirmation that at present pmem can be used completely transparently as system memory. But then I am not a Linux specialist and not necessarily familiar with the state of the art.

Yuki_Ni · February 15, 2022, 5:37am

Hi @ hiwot ,

I was following the ticket 3390407 you reported internally . Could you please refer to If you have a problem, PLEASE read this first to collect us a nvidai-bug-report which is requested by our engineers ? I’ll notify you a google Driver link to upload system log in case there is any security concerns . Please pay attention to our system emails. Thanks.