I require the assistance of the NVIDIA/Xorg wizards. This will be a long post. I will start with my computer’s specs:
CPU: Intel i9 10850k (with integrated graphics)
GPU: RTX 3070
OS: Fedora 34
Monitors: 1x BenQ EX2780Q 2560x1440 144hz, 1x QNIX QX2710
use case: ETH mining, CUDA workloads, and having a usable computer while doing these.
Nvidia drivers installed through RPMFusion:
(base) [user@fedora ~]$ dnf list installed | grep -i nvidia
akmod-nvidia.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
kmod-nvidia-5.12.6-300.fc34.x86_64.x86_64 3:465.31-1.fc34 @@commandline
kmod-nvidia-5.12.7-300.fc34.x86_64.x86_64 3:465.31-1.fc34 @@commandline
nvidia-persistenced.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
nvidia-xconfig.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64 3:465.31-1.fc34 @rpmfusion-nonfree-nvidia-driver
With this setup, pytorch works as I would expect. In fact, with no monitor cables plugged into the 3070, it’s almost like the GPU is in a headless mode. While idle, the output of nvidia-smi
just shows that 5MB of VRAM is being used for something (if this matters, I will post it), and that’s it. However, I want to also enable Coolbits, so I can overclock and adjust the fan settings. To enable Coolbits, I added a Xorg config file:
[root@fedora user]# cat /etc/X11/xorg.conf.d/10-nvidia.conf
Section "Module"
Load "modesetting"
EndSection
Section "Device"
Identifier "nvidia"
Driver "nvidia"
BusID "PCI:1@0:0:0"
Option "AllowEmptyInitialConfiguration"
Option "Coolbits" "12"
EndSection
Upon reboot, I could overclock and adjust my fans as I liked. However, I noticed very quickly that GNOME animations were smoother now. I figured I should try to use the GPU. Indeed, mining works with a respectable hashrate, however my computer becomes unusable and sluggish/stuttery etc. Unfortunately, it’s also the same result with pytorch and CUDA, so it can’t just be the mining software. nvidia-smi
now shows this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 75% 35C P5 15W / 150W | 518MiB / 7982MiB | 11% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2526 G /usr/libexec/Xorg 245MiB |
| 0 N/A N/A 2660 G /usr/bin/gnome-shell 263MiB |
| 0 N/A N/A 5959 G /usr/lib64/firefox/firefox 3MiB |
| 0 N/A N/A 21463 G /usr/lib64/firefox/firefox 3MiB |
+-----------------------------------------------------------------------------+
So it seems Xorg is also using the 3070, even though I have no monitors plugged into it, and some of GNOME is being offloaded to the 3070. That would also explain why GNOME seems snappier too now.
EDIT: Reading around and opening nvidia-settings now, it seems that with the 10-nvidia.conf file, I’m now using PRIME. I’ve tried this new config file:
Section "ServerLayout"
Identifier "layout"
Screen 0 "intel"
Inactive "nvidia"
EndSection
Section "Screen"
Identifier "intel"
Device "intel"
EndSection
Section "Module"
Load "modesetting"
EndSection
Section "Device"
Identifier "intel"
Driver "modesetting"
EndSection
which I’ve reasoned should be how to disable PRIME. However, this doesn’t work: I’m back at step 1 with Coolbits not working.
EDIT 2: I’ve removed a lot of the unnecessary information, since it seems all I really want to do is disable PRIME, and have Coolbits enabled. The above xorg.conf file is incorrect. Here is what I have now:
Section "Module"
Load "modesetting"
Disable "dri3"
EndSection
#Section "Device"
# Identifier "intel"
# Driver "modesetting"
# BusID "PCI:0:2:0"
#EndSection
Section "Device"
Identifier "nvidia"
Driver "nvidia"
BusID "PCI:1:0:0"
Option "AllowEmptyInitialConfiguration"
Option "Coolbits" "12"
EndSection
# Section "Screen"
# Identifier "intel"
# Device "intel"
#EndSection
#Section "Screen"
# Identifier "nvidia"
# Device "nvidia"
#EndSection
#Section "ServerLayout"
# Identifier "Layout0"
# Screen 0 "intel"
# Screen 1 "nvidia"
#EndSection
With the commented parts, I get PRIME. If I uncomment, Coolbits doesn’t work. Some people on the internet have mentioned that I should be able to select the PRIME display in nvidia-settings, but that might be a Ubuntu thing? Either way, that doesn’t work. Here are some other posts that explain possible workarounds, but nothing has worked for me link 1, link 2
Surely I can’t be the only person who has wanted to disable PRIME before?