Accessing specific numa node with cudaMallocManaged

I was trying to allocate memory using cudaMallocManaged and I want it to be in a specific numa node. Is there a way to do that? I am trying to access pmem (byte addressable Intel optane memory) using UVM. The pmem is exposed in the system as system memory in its own numa node but it seems like it is not possible to use this memory using UVM. Is that possible to do?

1 Like

Hi @hiwot,
Your question might be better suited for CUDA Programming and Performance - NVIDIA Developer Forums forum branch. I have moved your post there.

there is no direct control over this (i.e. via the CUDA runtime API).

The way to control process memory placement on linux is via numactl

I won’t be able to give you a recipe for what you are describing.

Using numactl did not help with allocating memory from pmem. Is it not possible to access pmem memory with CUDA API?

for example if we use cudaMallocManaged for the following simple example code

__global__ void add(size_t n, float *x, float *y1, float *y2)
{
  int index = blockIdx.x * blockDim.x + threadIdx.x;
  int stride = blockDim.x * gridDim.x;
  for (int i = index; i < n; i += stride)
    x[i] = y1[i] + y2[i];
}

int main(){

  size_t N = 10000000000;

  float *x, *y1, *y2;

  cudaMallocManaged(&x,N*sizeof(float));
  cudaMallocManaged(&y1,N*sizeof(float));
  cudaMallocManaged(&y2,N*sizeof(float));


  for (int i = 0; i < N; i++) {
    x[i] = 0;
    y1[i] = 1.0f;
    y2[i] = 2.0f;
  }


  int blockSize = 256;
  size_t numBlocks = (N + blockSize - 1) / blockSize;
  add<<<numBlocks, blockSize>>>(N, x, y1, y2);

  cudaDeviceSynchronize();

    cudaFree(x);
    cudaFree(y1);
    cudaFree(y2);
    return 0;
}

and run it with numactl --membind= the memory allocated with cudaMallocManaged won’t utilize the pmem node, is it possible to use pmem in this scenario?

This is really an operating system question, so a Linux forum is likely a better venue for your question.

As far as I know (and you should be able to confirm with a logging utility like strace) cudaMallocManaged is just a wrapper around a bunch of system calls, mostly mmap.

Searching around the internet, I find various discussions how pmem can be use in lieu of regular DRAM, how to select pmem with numactl, etc but best I can tell this seems to be work in progress. I don’t see any confirmation that at present pmem can be used completely transparently as system memory. But then I am not a Linux specialist and not necessarily familiar with the state of the art.