Heterogeneous Memory Support (HMM) in NVIDIA UVM driver and Linux 4.14

Hi, I have been trying to make HMM work wit the newest possible setup. There have been a few topics about HMM support on the forum before, but none of them had any follow-ups. I have investigated the Nvidia drivers, and there already seems to be support for HMM, however I didn’t manage to make it work.

My setup: CUDA 9.1.85, NVIDIA driver 387.26, Linux 4.14.5 (Ubuntu build), GTX 1050 Ti and GTX 1080 Ti.

First I checked the HMM support in the kernel:

grep HMM /boot/config-4.14.5-041405-generic
CONFIG_ARCH_HAS_HMM=y
CONFIG_HMM=y
CONFIG_HMM_MIRROR=y

Then I modified the nvidia-uvm module sources and recompiled the module to enable HMM support, reinserted it with the required parameters, and made sure that the hmm correctly enabled using uvm_hmm_is_enabled:

// You need all of these things, in order to actually run HMM:
//
//     1) An HMM kernel, with CONFIG_HMM set.
//
//     2) UVM Kernel module parameter set: uvm_hmm=1
//
//     3) ATS must not be enabled
//
bool uvm_hmm_is_enabled(void)
{
    return (uvm_hmm != 0) && (uvm8_ats_mode == 0);
}

Got a small example to test the HMM:

$ cat ./hmm-test.cu
#include <stdio.h>

__global__ void
compute_this(int *pDataFromCpu)
{
    atomicAdd(pDataFromCpu, 1);
}

int main(void)
{
    int *pData = (int*)malloc(sizeof(int));
    *pData = 1;

    compute_this<<<1,1024>>>(pData);
    if (cudaDeviceSynchronize() != cudaSuccess)
        printf("Error \n");

    printf("Results: %d\n", *pData);
    free(pData);
    return 0;
}
$ /usr/local/cuda-9.1/bin/nvcc -gencode arch=compute_61,code=sm_61 hmm-test.cu -o hmm-test
$ ./hmm-test 
Error 
Results: 1

The corresponding dmesg is:

[Thu Dec 14 19:04:40 2017] nvidia-uvm: Unloaded the UVM driver in 8 mode
[Thu Dec 14 19:04:43 2017] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
[Thu Dec 14 19:04:49 2017] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000038, engmask 00000101, intr 10000000

When I try to ftrace the nvidia-uvm module, it successfully calls into linux kernel’s hmm functions, namel:

hmm_device_new()
hmm_device_put()
hmm_device_release(
hmm_mirror_register()
hmm_mirror_unregister()

and gets callbacks through .sync_cpu_device_pagetables = &mirror_sync_cpu_device_pagetables; however this callback is not implemented in the nvidia-uvm driver. This is a small sample of address ranges from the sync callback:

mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21bef3c000 - 0x00007f21c0000000 
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21dee00000 - 0x00007f21df000000 
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21d0000000 - 0x00007f21def3b000 
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21d5f39000 - 0x00007f21d8000000 
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21dc000000 - 0x00007f21ddf39000 
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21d8000000 - 0x00007f21d8200000
...

Is there a way to make the current nvidia-uvm module and CUDA runtime work with Linux HMM, given that the code is already present in the driver, and can easily be enabled?

Hi,
thanks for your evaluation of current Nvidia HMM support…
some things to say:

  1. seems 390.xx series drivers is coming this year (see phoronix https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Mem-Perf-Regression)
    “Succeeding the NVIDIA 387 driver series will be the NVIDIA 390 Linux driver series, which should get into public beta in December.”
    hope you can test with 390 driver to see if mirror_sync_cpu_device_pagetables callback is implemented and HMM finally works…

  2. tried to replicate your tests with same CUDA 9.1 compiler, latest driver (387.34 patched for 4.15rc2 support) and kernel 4.15rc2 which is HMM enabled…
    but I use Maxwell GTX970 GPU so changed to sm_52
    “/usr/local/cuda-9.1/bin/nvcc -gencode arch=compute_52,code=sm_52 hmm-test.cu -o hmm-test”
    I get different error in dmesg and anyway seems HMM is only supported on Pascal and later GPUs right?

  3. just to be sure I’m testing my Maxwell GPU correctly your step
    " I modified the nvidia-uvm module sources and recompiled the module to enable HMM support," is not very clear:
    all I done is calling “make module” with NV_BUILD_SUPPORTS_HMM=1 argument:
    “make module SYSSRC=/usr/src/linux-headers-4.15.0-041500rc2-generic NV_BUILD_SUPPORTS_HMM=1”
    delete prveious module
    rmmod nvidia_uvm
    and then load module with
    modprobe nvidia_uvm uvm_hmm=1

notice I don’t need to modify source code

thanks…

Hi,

Did anyone successfully run HMM in NVIDIA driver and Linux with the latest NVIDIA driver on Pascal or other GPUs? If you did, which GPU/ driver version/ Linux version did you use?

Thanks,
Kevin

Up! any guild or example from Nvidia to try or test HMM?
appreciated!

Yes please… guidance from nVidia please.

[s]Maybe it’s too late, but I did. I needed the below command before I run my code.

sudo modprobe -r nvidia_uvm sudo modprobe nvidia_uvm uvm_hmm=1 uvm8_ats_mode=0

I needed to set uvm8_ats_mode 0 in addition to uvm_hmm.

I used V100 GPU on Skylake, nvidia driver: 418.67 (CUDA 10.1), Ubuntu 18.04LTS, Linux 4.15[/s]

Hmm, I couldn’t run HMM with the above configs at this time… maybe there was mistake around my test code. Sorry for my confusing post…

2 years later and this still does not seem to work despite all the hype from NVIDIA about it.

Using Kernel 5.0.0 with HMM support, Pascal GPU, and driver 440.33.01

modprobe nvidia_uvm uvm_hmm=1 uvm_ats8_mode=0

The test program still fails with an Xid Error. :/