(EE) no screens found(EE) whyle trying to enable the nvidia driver on Debian 11 emulated with bhyve

Hello @generix.

We have been able to hide the bhyve’s / kvm signature and it seems that it worked,even if I’m not totally sure,since I’m not able to turn on my external monitor. But I suspect that this does not depend on the old error.

So,on Debian Linux. I’ve attached an old monitor to the HDMI port of the graphic card and I tried to run “startx” from the Linux terminal and this is what happened :

root@marietto-BHYVE:~# nvidia-xconfig --query-gpu-info
Number of GPUs: 1

GPU #0:
 Name      : NVIDIA GeForce RTX 2080 Ti
 UUID      : GPU-74c6f81e-9e6c-f279-2878-2d28f4333f6b

 PCI BusID : PCI:0:3:0

 Number of Display Devices: 1

 Display Device 0 (TV-2):
     EDID Name             : Samsung SyncMaster
     Minimum HorizSync     : 30.000 kHz
     Maximum HorizSync     : 81.000 kHz
     Minimum VertRefresh   : 56 Hz
     Maximum VertRefresh   : 75 Hz
     Maximum PixelClock    : 140.000 MHz
     Maximum Width         : 1280 pixels
     Maximum Height        : 1024 pixels
     Preferred Width       : 1280 pixels
     Preferred Height      : 1024 pixels
     Preferred VertRefresh : 60 Hz
     Physical Width        : 340 mm
     Physical Height       : 270 mm

root@marietto-BHYVE:~# nvidia-xconfig

WARNING: Unable to locate/open X configuration file.

New X configuration file written to '/etc/X11/xorg.conf'

root@marietto-BHYVE:~# nano /etc/X11/xorg.conf
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 470.57.02

Section "ServerLayout"
   Identifier     "Layout0"
   Screen      0  "Screen0"
   InputDevice    "Keyboard0" "CoreKeyboard"
   InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
   # generated from default
   Identifier     "Mouse0"
   Driver         "mouse"
   Option         "Protocol" "auto"
   Option         "Device" "/dev/psaux"
   Option         "Emulate3Buttons" "no"
   Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
   # generated from default
   Identifier     "Keyboard0"
   Driver         "kbd"
EndSection

Section "Monitor"
   Identifier     "Monitor0"
   VendorName     "Unknown"
   ModelName      "Unknown"
   Option         "DPMS"
EndSection

Section "Device"
   Identifier     "Device0"
   Driver         "nvidia"
   BusID          "PCI:0:3:0"
   VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
   Identifier     "Screen0"
   Device         "Device0"
   Monitor        "Monitor0"
   DefaultDepth    24
   SubSection     "Display"
       Depth       24
   EndSubSection
EndSection

root@marietto-BHYVE:~# startx

X.Org X Server 1.20.11
X Protocol Version 11, Revision 0
Build Operating System: linux Debian
Current Operating System: Linux marietto-BHYVE 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-9-amd64 root=UUID=a33689a9-06d9-4bba-9e75-fdd1831b7e48 ro quiet splash resume=UUID=044c5b8f-f086-4491-9606-3bfa409b9d7
3
Build Date: 13 April 2021  04:07:31PM
xorg-server 2:1.20.11-1 (https://www.debian.org/support)  
Current version of pixman: 0.40.0
       Before reporting problems, check http://wiki.x.org
       to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
       (++) from command line, (!!) notice, (II) informational,
       (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Wed Nov  3 09:41:43 2021
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using config directory: "/etc/X11/xorg.conf.d"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE)  
Fatal server error:
(EE) AddScreen/ScreenInit failed for driver 0
(EE)  
(EE)  
Please consult the The X.Org Foundation support  
        at http://wiki.x.org
for help.  
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE)  
(EE) Server terminated with error (1). Closing log file.
xinit: giving up
xinit: unable to connect to X server: Connection refused
xinit: server error

I've attached the Xorg log file. My monitor does not turn on at all. I've thought that the cause was that my monitor was very very old,but I've used another monitor and I saw the same error.

Xorg.0.log_monitor (8.6 KB)g

NVIDIA(0): Failed to allocate shared surface
The nvidia driver depends on some cpu features, maybe those are nott announced by bhyve. Please post the output of:
cat /proc/cpuinfo

Linux marietto-BHYVE 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Nov  3 20:30:53 2021 from 192.168.1.6

root@marietto-BHYVE:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7200.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 1
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7200.13
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 2
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 3
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7199.97
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 4
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7200.21
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 5
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7201.70
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 6
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7199.96
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 13
cpu MHz         : 3600.000
cache size      : 16384 KB
physical id     : 7
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb rdtscp lm constant_tsc rep
_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm ab
m 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 bmi2 erms invpcid rtm rdseed xsaveopt arat md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips        : 7199.09
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

Would u like to compare the previous Xorg log file with a working Xorg config ? (I got the log file going under Linux + qemu + kvm and I wrote startx like I did on FreeBSD + bhyve and there it worked (under Linux). log file attached.

Xorg.0.log (23.3 KB)

I’ve also attached the output of cpuinfo got on linux…

cpuinfo_log (5.0 KB)

I’ve used the nVidia driver inside xorg.conf with the modesetting config and I’ve got the same error…attached logs…

xorg.conf-modesetting (1.4 KB)
Xorg.0.log-modesetting (9.2 KB)

Usually, that error shows up when pat support is either not advertised by tthe cpu or is disabled. Doesn’t seem to be the case here. Please attach a full dmesg output from boot-up.

ok.

dmesg_full_log (41.4 KB)

KVM:

[ 0.000132] MTRR default type: write-back
[ 0.000132] MTRR fixed ranges enabled:
[ 0.000133] 00000-9FFFF write-back
[ 0.000133] A0000-FFFFF uncachable
[ 0.000133] MTRR variable ranges enabled:
[ 0.000134] 0 base 00C0000000 mask FFC0000000 uncachable
[ 0.000134] 1 base 00B0000000 mask FFF0000000 uncachable
[ 0.000135] 2 base 0800000000 mask F800000000 uncachable
[ 0.000135] 3 disabled
[ 0.000135] 4 disabled
[ 0.000136] 5 disabled
[ 0.000136] 6 disabled
[ 0.000136] 7 disabled
[ 0.012530] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT

bhyve:

[ 0.000018] MTRR default type: uncachable
[ 0.000018] MTRR variable ranges disabled:
[ 0.000019] Disabled
[ 0.000019] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.000021] CPU MTRRs all blank - virtualized system.
[ 0.000023] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC

Looks like some more work to do on the bhyve side.

can you be more specific about what it is needed to make work the nvidia driver ? does it requires MTRR to be enabled ? Disabling MTRR and assuming all memory to be uncacheable hurts ?

Yes, the xorg graphics driver needs mtrr (and pat, which depends on mtrr) to run.
Might be interesting to see if wayland and cuda would work without it but that’s more an academical question.

For demonstration purposes, you can run the otherwise working kvm/debian vm with kernel parameter ‘nopat’ and you would run into the same issue.

why it is more an academical question ? what’s missing in Wayland ?

I guess your goal is a fully working system. So it while might be interesting whether wayland works on a half-working system, it doesn’t help reaching the goal.

You say that xorg need MTRR / PAT. You don’t say that nvidia driver needs them. BUT, I’ve been able to use the nouveau driver with the 2080 ti passed through and xorg worked correctly. So,the obvius question is : why xorg does not need of MTRR / PAT when it finds nouveau and it needs them when it finds the nvidia driver ? Xorg is still the same and bhyve has been the same (regarding the lacks of the MTRR / PAT features). What changes is the driver used.

I said the xorg driver (DDX) needs it and we’re talking about the nvidia driver here. I just mentioned xorg to make a contrast to the (nvidia) kernel driver noticeable.
Read: the nvidia DDX needs it.

Maybe you now understand the “academical question” a bit more which would be “does only the (nvidia) ddx or also the (nvidia) kernel drm kms depend on mtrr?”

MTRRs are a no-op in bhyve - memory is still cacheable since that is now controlled by the secondary EPT tables. Maybe this is not enough good for the driver.

hi @generix. we have enabled mtrr and pat,but it still does not work. We are investigating the reasons. In the meantime,I wanna give you some details,maybe you can understand what could be wrong at the moment :

root@marietto-BHYVE:~# cpuid -l 0x40000000

CPU 0:
hypervisor_id = "� d "
CPU 1:
hypervisor_id = "� d "
CPU 2:
hypervisor_id = "� d "
CPU 3:
hypervisor_id = "� d "
CPU 4:
hypervisor_id = "� d "
CPU 5:
hypervisor_id = "� d "
CPU 6:
hypervisor_id = "� d "
CPU 7:
hypervisor_id = "� d "

As you can see, there’s no bhyve signature. Additionally as you can see below,mtrr and pat are enabled,but it reports also a non expected value,the guest hypervisor is off and we don’t know why. Instead,on the previous change that we made,with mtrr and pat off,the hypervisor was active. The guest hypervisor state configued to off could depends by the fact that we have enabled them ?

https://pastebin.ubuntu.com/p/fZnnSPrXfq/

also the xorg.log file does not show the previous error anymore…

https://pastebin.ubuntu.com/p/4VRgWBP9CN/

this is the dmesg log file :

https://pastebin.ubuntu.com/p/9sMMM2gQPn/

let me know what you think.

Previously, when the kernel module worked, you were camouflaging as kvm, now you pretend to be bare-metal but maybe not good enough, does it work with mtrr+kvm camouflage?

Hello @generix.

We got it. My monitor turned on and blender recognizes CUDA and the GPU. A heartfelt thanks to you too, because your advices have been precise and professional. Some pictures below :



Anyway there is still something that does not work. The nvidia audio device that I pass through bhyve does not work inside Debian. I’m talking about this device :

ppt1@pci0:2:0:1: class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x10f7 subvendor=0x19da subdevice=0x2503
vendor = ‘NVIDIA Corporation’
device = ‘TU102 High Definition Audio Controller’
class = multimedia
subclass = HDA

inside the bhyve parameters it is something like this :

-s 3:1,passthru,2/0/1 \

this is the °error° : I doubt that it is fixable inside the debian settings.

It might require some additional bhyve patches.

E.g. ACRN’s source code contains follow comment:

/* Some audio drivers get topology data from ACPI NHLT table.

  • For such drivers, we need to copy the host NHLT table to make it
  • available to the Guest OS. Most audio drivers don’t need this by
  • default, when that’s the case setting this macro to 0 will avoid
  • unexpected failures.
  • The cAVS audio needs this however, to enable this feature.
    */

Can you give a look at any previous dmesg log messages to see if there are some errors which belong to the nvidia driver ?