Trying to downgrade CUDA 10.1 to 9.2 on RHEL 7.6 - yum install cuda for 9 getting circular x reference now

Hi,
I’ve been trying to downgrade from CUDA 10.1 to CUDA 9.2 on a RHEL 7.6 box.
I started out trying to load them both… that caused quite a bit of grief.
I finally got back to where I started with CUDA 10.1 running, then I uninstalled that and try to install cuda 9.2 and I’m now getting a circular reference in yum.
It removes xorg-x11-drv-nvidia-gl (and a few other X packages) due to obsoletes from the installed nvidia-driver-418.
It then throws an error on loading cuda-drivers-396 because it has a dependency on xorg-x11-drv-nvidia-gl, which it just removed because it was obsoleted it with 418.

How do I get CUDA 9.2 installed?


(background as to why: I’m trying to get the UCX that is packaged with HPC-X to run, but it appears to have a dependency on 9.2)

I may have finagled it out… Just in case anyone else ever gets tripped up similarly…
It looks like the cuda installer still thought the 418 driver was still present, even though it had been uninstalled.

I had used the .run files to uninstall the drivers.
I used the yum remove cuda
If I installed nvidia 396 from the .run file, it messed my graphical desktop up.
I then ran the uninstall for that using its .run file. That allowed the desktop/X again.
Oddly, after the reboot, nvidia-smi showed 396 still showed as being present (I’m not using the gpu for video).

Note: dkms is in the epel distro

From that point is where I was getting my earlier error.
I used the following steps (I can’t cut paste so pardon typos):

yum remove nvidia-driver-418.67-4.el7.x86_64
yum remove nvidia-driver-libs-418.67-4.el.x86_64
#Those two packages were causing the circular removals during the cuda install
#Removing those packages also removed dkms, so had to put it back
yum clean expire-cache
yum install dkms

I only had the 9.2 rpm installed … I removed the 10.1, so the following installed 9.2

yum install cuda

As a side note, the hello_c sample in HPC-X still reports that UCX Protocol is not supported…

And this still messed up my xwindow system. A reboot brought it out.
One has to be very careful with the x11.org.conf file.
I have two almost identical systems. One has gnome but not xorg.
I tried to edit the x11 conf file and didn’t manage to get it working.
I had to use grub to enter a lower run level.
I need to use the onboard intel video, and just the nvidia card for cuda.
The nvidia run files all seem to try to make nvidia the video driver in use.
That conf file appears to control that selection.
I’m learning.
I would delete this message chain if it would let me.

I ended up with cuda 10.1 running… I’m giving up on the idea of downgrading to 9.2. Trying to get to that setup caused a lot of grief with x11. The box is still a bit flaky during boot, but it at least boots to gnome ok.

If you have 10.1 running, it should be a trivial matter to also install 9.2 alongside 10.1

Go to CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer

select the legacy releases page…CUDA 9.2

select the runfile installer for CUDA 9.2

Run the runfile installer to install CUDA 9.2, but select “no” when prompted to install the driver - your driver that works with 10.1 will also work for 9.2

Follow the instructions in section 7 of the linux install guide to “switch” from CUDA 9.2 to CUDA 10.1 - it’s just a matter of changing environment variables.

I made the mistake of rebooting again… I can’t get the nvidia driver to load now.
“ERROR: Unable to load the ‘nvidia-drm’ kernel module”.
“nvidia: Unknown parameter 'NVreg_enableStreamMemOPs”.
It may have been crossed up with v430. Cuda loads v418. dkms status doesn’t report anything now either. I’m at a bit of a loss. Is that parm something added with v430? How do I fix this?
The installation log doesn’t help any. I’ve tried uninstalling v430, v418, cuda, and then reinstalling v418 w and wo the kernel module. I can’t find anything on the web for this one.
RHEL 7.6.
How does one unfoul their dkms if one loads an old nvidia driver without removing the last?
I’m not sure how I even managed this last problem. It was working. Murphy rules…
Any suggestions?

The end and simple result of this was:
“X -configure”
which will create a xorg.conf.new file under root.
You backup your old xorg.conf under /etc/X11/ and then copy that new one to xorg.conf.
Then reboot.
I’ve got things running with 9.2 again.
Apologies for the clutter.

(RHEL 7.6)