Hi, I have been trying to make HMM work wit the newest possible setup. There have been a few topics about HMM support on the forum before, but none of them had any follow-ups. I have investigated the Nvidia drivers, and there already seems to be support for HMM, however I didn’t manage to make it work.
My setup: CUDA 9.1.85, NVIDIA driver 387.26, Linux 4.14.5 (Ubuntu build), GTX 1050 Ti and GTX 1080 Ti.
First I checked the HMM support in the kernel:
grep HMM /boot/config-4.14.5-041405-generic
CONFIG_ARCH_HAS_HMM=y
CONFIG_HMM=y
CONFIG_HMM_MIRROR=y
Then I modified the nvidia-uvm module sources and recompiled the module to enable HMM support, reinserted it with the required parameters, and made sure that the hmm correctly enabled using uvm_hmm_is_enabled:
// You need all of these things, in order to actually run HMM:
//
// 1) An HMM kernel, with CONFIG_HMM set.
//
// 2) UVM Kernel module parameter set: uvm_hmm=1
//
// 3) ATS must not be enabled
//
bool uvm_hmm_is_enabled(void)
{
return (uvm_hmm != 0) && (uvm8_ats_mode == 0);
}
Got a small example to test the HMM:
$ cat ./hmm-test.cu
#include <stdio.h>
__global__ void
compute_this(int *pDataFromCpu)
{
atomicAdd(pDataFromCpu, 1);
}
int main(void)
{
int *pData = (int*)malloc(sizeof(int));
*pData = 1;
compute_this<<<1,1024>>>(pData);
if (cudaDeviceSynchronize() != cudaSuccess)
printf("Error \n");
printf("Results: %d\n", *pData);
free(pData);
return 0;
}
$ /usr/local/cuda-9.1/bin/nvcc -gencode arch=compute_61,code=sm_61 hmm-test.cu -o hmm-test
$ ./hmm-test
Error
Results: 1
The corresponding dmesg is:
[Thu Dec 14 19:04:40 2017] nvidia-uvm: Unloaded the UVM driver in 8 mode
[Thu Dec 14 19:04:43 2017] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 238
[Thu Dec 14 19:04:49 2017] NVRM: Xid (PCI:0000:01:00): 31, Ch 00000038, engmask 00000101, intr 10000000
When I try to ftrace the nvidia-uvm module, it successfully calls into linux kernel’s hmm functions, namel:
hmm_device_new()
hmm_device_put()
hmm_device_release(
hmm_mirror_register()
hmm_mirror_unregister()
and gets callbacks through .sync_cpu_device_pagetables = &mirror_sync_cpu_device_pagetables; however this callback is not implemented in the nvidia-uvm driver. This is a small sample of address ranges from the sync callback:
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21bef3c000 - 0x00007f21c0000000
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21dee00000 - 0x00007f21df000000
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21d0000000 - 0x00007f21def3b000
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21d5f39000 - 0x00007f21d8000000
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21dc000000 - 0x00007f21ddf39000
mirror_sync_cpu_device_pagetables: type: 0, 0x00007f21d8000000 - 0x00007f21d8200000
...
Is there a way to make the current nvidia-uvm module and CUDA runtime work with Linux HMM, given that the code is already present in the driver, and can easily be enabled?