Alloc_on_init=0 causes cuda initialization to fail in some circumstances

CUDA fails to initialize if the nvidia module option NVreg_InitializeSystemMemoryAllocations=0 and kernel option alloc_on_init=0 is set.

It does however work if you set NVreg_InitializeSystemMemoryAllocations=1 and alloc_on_init=0

init_on_alloc is usaully set to 1 but it depends on how you configure the kernel, if set to 1, CUDA always works.

But it might cause unnecessary issues if people combine these settings the wrong way. So I think it needs to be fixed or at least looked at.

When it fails, clinfo only shows this:

$ clinfo 
Number of platforms                               0

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.3.2
  ICD loader Profile                              OpenCL 3.0

Setting NVreg_InitializeSystemMemoryAllocations=1 didn’t help me. On my system init_on_alloc must remain at the Linux default (1) or else CUDA is unusable.

This issue affects both the legacy closed and the new open source drivers.

The symptom is kernel: NVRM: nvGpuOpsReportFatalError: uvm encountered global fatal error 0x60, requiring os reboot to recover in the system journal when one attempts to use CUDA.