I wanna to install Nvidia driver and CUDA 7.5 in the virtual machine with CentOS 7,but did not get lucky.
On the physical machine,the driver works well,and CUDA “devicequery” can get the Tesla GPU information.
Then I enabled “IOMMU” and “vfio” for the GPU passthrough,config Openstack with pci_passthrough and pci_alias,also created flavor,the provisioning of VM is good and I can see the GPU device in the VM,but the installation of Nvidia driver failed.
Here are current config, any advises? Thanks.
- On Nova host physical machine
IOMMU is enabled
[root@kilo-k40 ~]# dmesg |grep -e IOMMU
[ 0.000000] DMAR: IOMMU enabled
[ 0.141127] DMAR-IR: IOAPIC id 10 under DRHD base 0xfbffc000 IOMMU 0
[ 0.141129] DMAR-IR: IOAPIC id 8 under DRHD base 0xc7ffc000 IOMMU 1
[ 0.141130] DMAR-IR: IOAPIC id 9 under DRHD base 0xc7ffc000 IOMMU 1
[root@kilo-k40 ~]# lspci -Dnn |grep -i nvidia
0000:0d:00.0 3D controller : NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
0000:0e:00.0 3D controller : NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
[root@kilo-k40 ~]# lsmod |grep vfio
vfio_pci 36864 2
vfio_iommu_type1 20480 2
vfio_virqfd 16384 1 vfio_pci
vfio 28672 8 vfio_iommu_type1,vfio_pci
[root@kilo-k40 ~]# readlink /sys/bus/pci/devices/0000:0d:00.0/driver
[root@kilo-k40 ~]# readlink /sys/bus/pci/devices/0000:0e:00.0/driver
kernel is 4.2.8, qemu-kvm is 2.3.0.
- in the virtual machine
the GPU device is there.
[root@mj-223test-mjcentos7-gpuk40m2 ~]# lspci -Dnn |grep -i nvidia
0000:00:05.0 3D controller : NVIDIA Corporation GK110BGL [Tesla K40m] [10de:1023] (rev a1)
kernel is 4.2.8.
the “nouveau” has been added to the modprobe blacklist
the installation of Nvidia driver 352.99 failed.
[root@mj-223test-mjcentos7-gpuk40m2 ~]# sh NVIDIA-Linux-x86_64-352.99.run --kernel-source-path /usr/src/linux-4.2.8/
[root@mj-223test-mjcentos7-gpuk40m2 ~]# vim /var/log/nvidia-installer.log
-> Kernel module compilation complete. -> Unable to determine if Secure Boot is enabled: No such file or directory ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information. -> Kernel module load error: No such device -> Kernel messages: [ 9.383053] [drm] size 33554432 [ 9.383054] [drm] fb depth is 24 [ 9.383054] [drm] pitch is 3072 [ 9.386479] fbcon: cirrusdrmfb (fb0) is primary device [ 9.417744] Console: switching to colour frame buffer device 128x48 [ 9.434084] cirrus 0000:00:02.0: fb0: cirrusdrmfb frame buffer device [ 9.434085] cirrus 0000:00:02.0: registered panic notifier [ 9.468661] [drm] Initialized cirrus 1.0.0 20110418 for 0000:00:02.0 on minor 0 [ 36.749597] random: nonblocking pool is initialized [ 44.233814] Adjusting kvm-clock more than 11% (9437140 vs 9311354) [ 194.989636] nvidia: module license 'NVIDIA' taints kernel. [ 194.989644] Disabling lock debugging due to kernel taint [ 194.995840] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 195.070423] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11 [ 195.072752] NVRM: The NVIDIA GPU 0000:00:05.0 (PCI ID: 10de:1023) NVRM: installed in this system is not supported by the 352.99 NVRM: NVIDIA Linux driver release. Please see 'Appendix NVRM: A - Supported NVIDIA GPU Products' in this release's NVRM: README, available on the Linux driver download page NVRM: at www.nvidia.com. [ 195.112234] nvidia: probe of 0000:00:05.0 failed with error -1 [ 195.191772] NVRM: The NVIDIA probe routine failed for 1 device(s). [ 195.191776] NVRM: None of the NVIDIA graphics adapters were initialized! [ 195.191778] [drm] Module unloaded [ 195.191968] NVRM: NVIDIA init module failed! ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com. "/var/log/nvidia-installer.log" 7398L, 552218C
Any ideas? Thanks !!