BUG: nvidia_uvm needs to be removed and re-inserted in order to work after wakeup from suspend

Hi @amrits , I was having the exact same issue on two machines (one laptop, one PC). I tested the state with the following code:

#include <stdio.h>
#include <cuda.h>

int main() {
    CUresult ret = cuInit(0);

    if (ret != CUDA_SUCCESS) {
        fprintf(stderr, "cuInit failed! Error code: %d\n", ret);
        return 1;
    }

    printf("CUDA initialized successfully!\n");

    return 0;
}

Compiled and ran with this:

cc test_cuda.c -lcuda -I/opt/cuda/include && ./a.out

I get the following after I suspend once:

cuInit failed! Error code: 999

If I run sudo modprobe -r nvidia_uvm && sudo modprobe nvidua_uvm it does work. I just now enabled options nvidia NVreg_PreserveVideoMemoryAllocations=1 in /etc/modprobe/nvidia.conf and enabled nvidia-suspend.service and nvidia-hibernate.service (as per Arch wiki). That seems to have fixed my issue, I no longer need to unload and load nvidia_uvm anymore. Though I tested this only briefly, if things change I will let you know.

In case it helps, attached is an nvidia-bug-report from before I made those changes.
nvidia-bug-report.log.gz (1.8 MB)