Installing CUDA toolkit and NVIDIA graphics drivers on Ubuntu 22.04 with 6.1.0-1020-oem kernel

Hi, I have tried to install CUDA toolkit on my Lenovo Ideapad 5 Pro with Ryzen 9 and GeForce RTX 3050 Mobile CUDA capable card. It is obviously far from trivial. I tried the following guide:

I went through the checklist of requirements, and it says you need kernel 6.2. However, I have 6.1.0-1020-oem, because otherwise my laptop can’t funnction correctly with sleep mode, etc.
I have some questions regarding the recommended way of installing CUDA development libraries.

  1. It is ok to have the 6.1.0-1020-oem kernel?
  2. I removed the proprietary NVIDIA drivers, and I wnat ot install any version that increase my chances to get CUDA development libraries not conflict with it. Which version should I choose?
  3. Should I go for the .deb packages or the runfile?
  4. Should I go for the “open” or “legacy” version of the drivers/toolkit? (sorry I am a bit confused)
  5. Is it possible to run wayland after the installation? (I can give it up if necessary)

The reason I want to install CUDA development libraries and the nvcc compiler, is that I want to learn how to program with GPU acceleration, in order to explore heavy numerical computations, not necessarily related to machine learning or gaming, just for various numerical/math related interests. In particular I want to try combining Rust programming with CUDA programming.
What is the best bet to get it working on Ubuntu 22.04? Is there any guide/script out there that is known to work?

Best regards,
David

Yes, it should be.

install whatever drivers come with your installation method

There is not one right answer here. Some people like the package manager methds because during installation they will typically pull in needed dependencies. The usage of package managers to install/update software is not unique or specific to CUDA, so you may want to study the reasons people use them. However the runfile method is also pretty reliable and may be preferred if you want more detailed control over the install process.

At this point in time, both are pretty “current”. Over time, the “legacy” method option may become less widely used. Either one should work fine for you, although the exact install steps are different.

Yes, it should be possible, I have done it personally. However there are specific install steps to make sure that your desktop GUI works correctly. People often find the .deb method easier for this, however the runfile method has specific switches to make sure this happens also. There are many questions here on the forums about it. I don’t have a detailed write-up for you, but the linux install guide is a good start for background on some of these questions.

1 Like

Thanks a lot!

I also wonder about the “udev rule” in the section about Power9:

There is an instruction saying that one should “Disable a udev rule installed by default in some Linux distributiotions …”. Ubuntu 18.04 is mentioned, but it is not clear to me what I should do here or if it’s needed. Should I move the line found in
/lib/udev/rules.d/40-vm-hotadd.rules
to /etc/udev/rules.d and edit it? The instruction on how to do this editing is for RHEL 7.5, so what about Ubuntu?

Best regards, David

That section applies to systems that have a IBM Power9 CPU. Yours does not.

Sorry I didn’t notice it earlier. Radeon 3050 is not a NVIDIA GPU, and CUDA won’t work on it. None of this will work for you.

1 Like

Thanks for your reply! I’m sorry I did not specify the name of my graphics card correctly.

I check my cgraphics card as follows:

lspci | grep -i nvidia
01:00.0 3D controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1)

Previously when I installed the NVIDIA drivers I could run nvidia-smi, and I got the impression I had 2048 CUDA cores. So it should be possible to program it…

Yes, an RTX 3050 is a NVIDIA GPU, CUDA capable.

1 Like

Hello! I tried installing with the runfile, but failed. I have kept the logs, but I don’t understand what is wrong, other than possibly 1) it doesn’t work with the oem kernel, 2) My kernel was compiled with another version of cc than the one used in the script, or 3) I passed the -m=kernel-open option?

The cuda_installer.log is reasonably short, so I include it here:

[INFO]: Adding driver option -m=kernel-open
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.18.3)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 545.23.08
[INFO]: Executing NVIDIA-Linux-x86_64-545.23.08.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd --kernel-module-build-directory=kernel-open  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 545.23.08 failed, quitting

The nvidia-installer.log is over 175k, so I’ll upload it, and hope that’s ok…

There are warnings about that my cc has another version than that the kernel was compiled with, but I don’ see if that is the error, or something else.

Could the error be triggered by the option -m=kernel-open? If I skip this flag, will I still be able to compile CUDA programs, or what is this needed for?

Best regards, David
nvidia-installer.log (171.1 KB)

Some system info, and more detail what I did:

  - ubuntu release *ok*
    22.04.3
  - kernel *ok?*
    6.1.0-1020-oem
  - gcc version *ok*
    gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
  - glibc *ok*
    2.35

      sudo apt install linux-headers-$(uname -r)
      - log
        [...]
        linux-headers-6.1.0-1020-oem is already the newest version (6.1.0-1020.20).
        linux-headers-6.1.0-1020-oem set to manually installed.
        0 upgraded, 0 newly installed, 0 to remove and 123 not upgraded.

      sudo apt install linux-libc-dev
      - log
        [...]
        The following packages will be upgraded:
          linux-libc-dev
        [...]
        Preparing to unpack .../linux-libc-dev_5.15.0-94.104_amd64.deb ...
        Unpacking linux-libc-dev:amd64 (5.15.0-94.104) over (5.15.0-91.101) ...
        Setting up linux-libc-dev:amd64 (5.15.0-94.104) ...

wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run

I booted into runlevel 3 (replaced by systemd in modern ubuntu):

sudo systemctl set-default multi-user.target
reboot

Then I disabled the nouveau driver by

sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf
blacklist nouveau
options nouveau modeset=0
sudo update-initramfs -u
reboot

In runlevel 3 I did:

sudo sh cuda_12.3.2_545.23.08_linux.run -m=kernel-open

After this, the error message came, and I have no clue how to proceed.

I came to think about another option: if the problem is the OEM kernel, maybe I can boot on a standard kernel and install the CUDA toolkit there, compile the CUDA code there. Then boot the oem kernel and execute the CUDA binaries on it. I can live with non functional sleep mode (std kernel) if it allows me to compile CUDA kernels, and then during normal operation with functinoal sleep mode (oem kernel), I can run the code I’ve written.

There is a warning of impending trouble at the top of your driver install log, basically that the gcc version used to compile the kernel (12.x) does not match the one you have selected (11.x).

You need to fix that. I don’t have a recipe or instructions for you. Setting up gcc of a particular version is a fairly standard thing you can google for. Later, when you get an error like this:

cc: error: unrecognized command-line option '-ftrivial-auto-var-init=zero'

That is definitely due to the build system expecting gcc 12.x

1 Like

Thank you so much! Now it compiled successfully :-)
I already had gcc-12 installed, but I I had to switch to it using update-alternatives. Sorry for my ignorance.

David

Hi again, and sorry for bothering you. The example suites seem to work fine (yay!!! :-) ) when no graphics is involved, but for instance when I tried this one, it failed. Maybe I have the wrong version of some X11 libraries?

In cuda-samples/Samples/5_Domain_Specific/fluidsGL:
(make succeeded)

./fluidsGL 
fluidsGL Starting...

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

GPU Device 0: "Ampere" with compute capability 8.6

CUDA device [NVIDIA GeForce RTX 3050 Laptop GPU] has 16 Multi-Processors
CUDA error at fluidsGL.cpp:467 code=999(cudaErrorUnknown) "cudaGraphicsGLRegisterBuffer(&cuda_vbo_resource, vbo, cudaGraphicsMapFlagsNone)" 
david@david-IdeaPad-5-Pro-16ACH6:~/sw_inst/CUDA/attempt_3/cuda-samples/Samples/5_Domain_Specific/fluidsGL$ echo $?
1

I attach the output of m̀ake, deviceQuery, and nvidia-smi` if that might help.

If this issue should be opened as a new topic, what category does it belong to?

Best regards,
David

make_fluidsGL.log (1.7 KB)
deviceQuery.log (2.7 KB)
nvidia-smi.log (1.7 KB)

read the linux install guide. Do a search on OpenGL. Read everything that applies to OpenGL. This is a common problem, with a bit of searching you will find other forum posts asking about it. Basically when you installed the driver, you did not make the proper setup to use OpenGL accelerated by the NVIDIA GPU driver, which is necessary for that sample code.

Be advised that this option may cause havoc with your laptop display. Again, you can find many forum articles. I don’t have a recipe for you. Unless this is the core of what you are trying to do, I would consider the option of leaving things as-is.

1 Like

It works! Many thanks!
The graphical demos work: great and cool ! :-)

The following may be good for other Ubuntu users (who may have similar problems):

I read all the notes about OpenGL in the manual, but nvidia-xconfig didn’t work: it created an X config file, but also warned about “Package xorg-server was not found in the pkg-config search path”. REstarting rendered black screen. I removed the file and it worked again. Maybe it could work with some more investigation, but I used prime-select instead.

I installed nvidia-prime, and then used sudo prime-select nvidia to select nvidia as X driver.
Sometimes the system freezes, and the only way out is the power button, but after that it works again. Fractional scaling doesn’t work.
I switch back to amdgpu with sudo prime-select intel.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.