Ubuntu 22.04.3 LTS Server, Tesla P100, Driver Version: 470.199.02, CUDA Version: 11.4

cemdede · August 19, 2023, 2:37am

Hi,

I added Tesla P100 16GB to Dell PowerEdge R730 server, running on Ubuntu 22.04.3 LTS Server.

uname -r
5.15.0-79-generic

uname -a
Linux atlas 5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

nvcc --version
Command ‘nvcc’ not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

When I try to install cuda tool kit, it deletes driver 470 during installation.

How can I install Cuda???

Thank you!

cemdede · August 19, 2023, 2:39am

sudo lshw -c video
*-display
description: 3D controller
product: GP100GL [Tesla P100 PCIe 16GB]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:03:00.0
logical name: /dev/fb0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list fb
configuration: depth=32 driver=nvidia latency=0 mode=1440x900 visual=truecolor xres=1440 yres=900
resources: iomemory:3b80-3b7f iomemory:3bc0-3bbf irq:196 memory:91000000-91ffffff memory:3b800000000-3bbffffffff memory:3bc00000000-3bc01ffffff
*-display
description: VGA compatible controller
product: G200eR2
vendor: Matrox Electronics Systems Ltd.
physical id: 0
bus info: pci@0000:09:00.0
logical name: /dev/fb0
version: 01
width: 32 bits
clock: 33MHz
capabilities: pm vga_controller bus_master cap_list rom fb
configuration: depth=32 driver=mgag200 latency=64 maxlatency=32 mingnt=16 resolution=1440,900
resources: irq:19 memory:90000000-90ffffff memory:92800000-92803fff memory:92000000-927fffff memory:c0000-dffff

sudo dmesg | grep nvidia
[ 10.643212] nvidia: loading out-of-tree module taints kernel.
[ 10.643247] nvidia: module license ‘NVIDIA’ taints kernel.
[ 10.663439] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 10.677660] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[ 10.815162] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 470.199.02 Thu May 11 11:46:10 UTC 2023
[ 10.818325] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[ 10.818351] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 1
[ 14.025381] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 14.030120] nvidia-uvm: Loaded the UVM driver, major device number 508.
[ 14.731673] audit: type=1400 audit(1692411947.088:3): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe” pid=1587 comm=“apparmor_parser”
[ 14.731678] audit: type=1400 audit(1692411947.088:4): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe//kmod” pid=1587 comm=“apparmor_parser”

cemdede · August 19, 2023, 3:02am

After using the code in NVIDIA driver download page:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

here is the output:

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+

nvcc --version
Command ‘nvcc’ not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

cemdede · August 19, 2023, 3:48pm

Solved:

First started with a higher version of Driver and Cuda installation from:

Here is the code I used to install both supplied from the download page above:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

cuda is installed and shows under:

cd /usr/local/cuda

but

nvcc --version #says
Command ‘nvcc’ not found, but can be installed with:

To fix this

echo $PATH

looks like, cuda is not there

/home/cem/anaconda3/bin:/home/cem/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

We will edit the $PATH

nano ~/.bashrc

copy this to the end

export PATH=/usr/local/cuda/bin:$PATH

save and exit (control o, enter, control x)

update the source

source ~/.bashr

check if it worked

which nvcc
/usr/local/cuda/bin/nvcc

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jul_11_02:20:44_PDT_2023
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0

Yes, it worked.

But still

import torch
print(torch.version)
print(torch.cuda.is_available())

2.0.1
False

uninstall torch-vision and re-install it

pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu122/torch_stable.html

reboot the machine

sudo reboot now

check if it is is intalled

print(“Torch version:”,torch.version)
print(“CUDA version used by PyTorch:”, torch.version.cuda)
print(“CUDA available:”, torch.cuda.is_available())
print(“Number of CUDA devices:”, torch.cuda.device_count())
print(“Current CUDA device:”, torch.cuda.current_device())

Torch version: 2.0.1+cu117
CUDA version used by PyTorch: 11.7
CUDA available: True
Number of CUDA devices: 1
Current CUDA device: 0

Taaaa DAAAaaa !

nvidia-smi cuda version is not matching nvcc --version ???

nvidia-smi shows the highest version of cuda that can be supported by the nvidia driver

Topic		Replies	Views
Cannot install cuda to ubuntu 21.04 CUDA Setup and Installation	3	6104	August 3, 2021
Installing the latest NVIDIA drivers, CUDA, and cuDNN in Ubuntu 22.04 LTS CUDA Setup and Installation ubuntu , cudnn	4	25204	May 13, 2024
install nvidia-driver418 and cuda9.2.-->CUDA driver version is insufficient for CUDA runtime vers... CUDA Setup and Installation	2	1183	April 13, 2019
Installing Cuda on Ubuntu 22.04 with RTX 4090 CUDA Setup and Installation	4	24243	November 17, 2022
Cuda installation issues on Ubuntu 22.04 laptop CUDA Setup and Installation	1	2065	September 12, 2023
Ubuntu 20.04, probem to install nvcc Linux	10	2924	February 4, 2021
Nvidia Driver is not working on Ubuntu 22.04 (cuda for tensorflow-gpu) CUDA Setup and Installation cuda , nvidia-smi	2	2059	April 22, 2023
Nvidia Cuda Compiler not showing up in Linux 22.04 Linux cuda , linux , nvcc	24	19629	May 30, 2022
Can't Install nvidia drivers/ cuda toolkit on ubuntu CUDA Setup and Installation ubuntu	3	2630	June 24, 2021
CUDA 10.1 installation on Ubuntu 18.04 does not detect installed driver version CUDA Setup and Installation	1	1303	April 30, 2019