331.20 and multiple kernel modules

Hello,

According to the release [1] notes and the documentation FAQ [2] it is possible to build up to eight kernel modules in order to reduce GPU access overhead for multiple devices. I tried installing the nvidia driver this way and I found two main issues with this new compilation mode.

First, the Unified Memory access driver (nvidia-uvm.ko) does not seem to be compatible with building multiple module instances. For me it wasn’t a big deal but it should be documented somewhere to save people a few hours when trying to figure out why the NVidia installer does not work when using --multiple-kernel-modules. In other words, I’d suggest extending the installer by implying --no-unified-memory when --multiple-kernel-modules is used.

Second, I couldn’t use the Xorg driver with multiple kernel modules. I was attempting a multiseat configuration: two graphics cards controlled by two independent Xorg servers. After following all the instructions (loading the modules with the NVreg_AssignGpus and exporting the __NVIDIA_KERNEL_MODULE_INSTANCE environment variable), Xorg was still refusing to start complaining about not finding any graphics device.

Therefore, it seems that the nvidia Xorg driver still looks for the nvidia.ko kernel module (not nvidia-frontend.ko). Maybe the multiple kernel modules feature is intended for CUDA/OpenCL users. Then, is it a bug or is the expected behaviour?

Best regards.

References:

  1. http://www.nvidia.com/download/driverResults.aspx/69372/en-us
  2. http://us.download.nvidia.com/XFree86/Linux-x86_64/331.20/README/faq.html

iSac, Thanks for reporting this issue. Please provide nvidia bug report for both issues. and also could you please provide reproduction steps step by step?

Filed Bug 1414996 : Nvidia installer fail to install driver when passed option --multiple-kernel-modules=# to track this issue.

Thanks for filling in the bug report.

The nvidia Xorg module does not seem to work with multiple kernel modules. Try to launch Xorg manually, as root, with the __NVIDIA_KERNEL_MODULE_INSTANCE variable exported. The nvidia driver will still complain.

Tracking Xorg fail to start issue under Bug 1421460

Is there a solution for this bug? It seems nvidia_drv.so fails to find/load the nvidia driver.
I have setup the environment variable __NVIDIA_KERNEL_MODULE_INSTANCE before running the Xorg server.
I’m using a GRID K1.

I can see /dev/nvidia0, /dev/nvidiactl0, /dev/nvidia1, /dev/nvidiactl1 … /dev/nvidia3, /dev/nvidiactl3.

running nvidia-smi as:
$ __NVIDIA_KERNEL_MODULE_INSTANCE=0 nvidia-smi
Works well.

this is how I attempt to run the Xorg server:
export __NVIDIA_KERNEL_MODULE_INSTANCE=0 Xorg :0 -config xorg.0.conf &

[ 11560.290] (EE) NVIDIA: Failed to load the NVIDIA kernel module. Please check your
[ 11560.290] (EE) NVIDIA: system’s kernel log for additional error messages.
[ 11560.290] (EE) No devices detected.
[ 11560.290]

Deleted…

Deleted…