Is cudaMallocHost NUMA-aware?

Is cudaMallocHost NUMA-aware?

For example, does it know that it should try to allocate memory on the same NUMA node where the current GPU is plugged in?

I gave the memory allocated by cudaMallocHost to move_pages (a function of libnuma that can tell you what node the memory is located on), and it looks like on my dual-LGA2011 system, the memory ends up on one or the other node randomly.

I’m not entirely sure that pinned memory is a valid argument to move_pages, but it didn’t complain.

I think if you start by pinning the process using e.g. taskset, you will find that the memory will be pinned local to that process (i.e. on the memory belonging to the same socket that the process is pinned to), until you run out of space for memory assigned to that socket.

If you pin the process, do you still see random locations for the pinned memory (assuming small allocations) ?

You are right: At first, I didn’t see much correlation, but after sync && echo 3 >> /proc/sys/vm/drop_caches, it worked out exactly as you predicted. Thanks for the answer!