When an application is started, the memory is allocated on all GPUs. This affects both the X11 session and Cuda applications. I am not sure by which setting this is caused. It also does not occur in the multi-user target, so I suspect it has to do with the X11 session.
System
2x NVIDIA Quadro RTX 4000
Fedora 33 (5.12.12-200.fc33.x86_64)
CUDA Version: 11.3
Driver Version: 465.19.01
Tried settings
/etc/X11/xorg.conf
...
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:101:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "MultiGPU" "off"
Option "SLI" "off"
SubSection "Display"
Depth 24
EndSubSection
EndSection
/etc/X11/xorg.conf.d/10-nvidia.conf
Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
Option "AllowEmptyInitialConfiguration"
Option "SLI" "off"
Option "BaseMosaic" "on"
Option "PrimaryGPU" "yes"
Option "MultiGPU" "off"
EndSection
Section "ServerLayout"
Identifier "layout"
Option "AllowNVIDIAGPUScreens"
EndSection
nvidia-smi