Hi Team,
I have new Debian 11 server in which I had added Nvidia L4 Tensor GPU but getting below errors when trying to install cuda-12.1.0.
Server specification:
OS: Debian 11
Arch: x86_64
Kernal: 5.10.27 (Also tried with 5.10.26 and 5.10.28)
GPU: 2x Nvidia L4 Tensor
Cuda-toolkit version: 12.1.0
Nvidia-Driver: 530.x.x
Error:
make[1]: Leaving directory ‘/usr/src/linux-5.10.27’
→ done.
→ Kernel module compilation complete.
ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.
Please see the log entries ‘Kernel module load error’ and ‘Kernel messages’ at the end of the file ‘/var/log/nvidia-installer.log’ for more information.
→ Kernel module load error: No such device
→ Kernel messages:
[ 1114.484768] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[ 1114.484772] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:01:00.0)
[ 1114.485640] nvidia: probe of 0000:01:00.0 failed with error -1
[ 1114.485656] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1114.485685] NVRM: None of the NVIDIA devices were initialized.
[ 1114.485846] nvidia-nvlink: Unregistered Nvlink Core, major device number 247
[ 3302.465119] device-mapper: uevent: version 1.0.3
[ 3302.465788] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com
[ 3927.672828] VFIO - User Level meta-driver version: 0.3
[ 3927.719086] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[ 3927.719090] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:01:00.0)
[ 3927.720036] nvidia: probe of 0000:01:00.0 failed with error -1
[ 3927.720049] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 3927.720049] NVRM: None of the NVIDIA devices were initialized.
[ 3927.720238] nvidia-nvlink: Unregistered Nvlink Core, major device number 247
[ 4293.441205] VFIO - User Level meta-driver version: 0.3
[ 4293.488659] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[ 4293.488663] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:01:00.0)
[ 4293.489583] nvidia: probe of 0000:01:00.0 failed with error -1
[ 4293.489596] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 4293.489596] NVRM: None of the NVIDIA devices were initialized.
[ 4293.489771] nvidia-nvlink: Unregistered Nvlink Core, major device number 247
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.