Enabling Heterogeneous Memory Management

I have been looking into enabling cuda unified memory with heterogeneous memory management. I can’t find any official release statement by NVIDIA, but it looks like it should now be available. The
HMM modules were added into the Linux kernel 4.14.

My environment is
Fedora 26, 4.15.17-200.fc26.x86_64
and /boot/config-4.15.17-200.fc26.x86_64 has:

CONFIG_ARCH_HAS_HMM=y
CONFIG_HMM=y                                                                                                            
CONFIG_HMM_MIRROR=y

I have cuda version 9.1.85 and driver version 390.48, although I have tried with some older drivers as well.

This is my test code, partially borrowed from (https://devtalk.nvidia.com/default/topic/1027558/linux/heterogeneous-memory-support-hmm-in-nvidia-uvm-driver-and-linux-4-14/):

#include <stdio.h>                                                                                                      
                                                                                                                        
__global__ void                                                                                                         
compute_this(int *pDataFromCpu)                                                                                         
{                                                                                                                       
    printf ("start\n");                                                                                                 
    *pDataFromCpu = 7;                                                                                                  
    printf ("yay\n");                                                                                                   
}                                                                                                                       
                                                                                                                        
int main(void)                                                                                                          
{                                                                                                                       
    int *pData = (int*)malloc(sizeof(int));                                                                             
    if (pData == NULL) { printf("Malloc failed!\n"); exit(1); }                                                         
    //int *pData = NULL;                                                                                                
    //cudaMallocManaged(&pData, sizeof(int));                                                                           
    *pData = 1;                                                                                                         
                                                                                                                        
    compute_this<<<1,1>>>(pData);                                                                                       
    if (cudaDeviceSynchronize() != cudaSuccess)                                                                         
        printf("Error \n");                                                                                             
                                                                                                                        
    printf("Results: %d\n", *pData);                                                                                    
    cudaFree(pData);                                                                                                    
    return 0;                                                                                                           
}

I have nvidia_uvm loaded with uvm_hmm=1. The output of this code should be

start
yay
Results: 7

but instead I get

start
Error 
Results: 1

implying some issue with the HMM.

Has anybody had luck getting this to work?