Install CUDA 10.1 but nvidia-smi gets CUDA 10.2

yatchiu · February 28, 2020, 3:29am

I am using an nvidia-docker on a remote ubuntu 16.04 server.

When typing in ‘nvidia-smi’, it shows CUDA 10.1. But when I check under ‘/usr/local’, there is no /usr/local/cuda, but only /usr/local/cuda-10.0 (strange, not cuda-10.1).

I am using PyTorch 1.0.0 (cuda 10), and when I ran a program here (GitHub - jwyang/faster-rcnn.pytorch: A faster pytorch implementation of faster r-cnn) (branch pytorch-1.0), I got

THCudaCheck FAIL file=/home/rizhao/projects/SMITHS/code/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu line=297 error=98 : unrecognized error code
Traceback (most recent call last):
File “trainval_net.py”, line 321, in
rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File “/home/rizhao/anaconda3/envs/smith/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/rizhao/projects/SMITHS/code/faster-rcnn.pytorch/lib/model/faster_rcnn/faster_rcnn.py”, line 77, in forward
pooled_feat = self.RCNN_roi_align(base_feat, rois.view(-1, 5))
File “/home/rizhao/anaconda3/envs/smith/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/rizhao/projects/SMITHS/code/faster-rcnn.pytorch/lib/model/roi_layers/roi_align.py”, line 58, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio
File “/home/rizhao/projects/SMITHS/code/faster-rcnn.pytorch/lib/model/roi_layers/roi_align.py”, line 20, in forward
output = _C.roi_align_forward(input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio)
RuntimeError: cuda runtime error (98) : unrecognized error code at /home/rizhao/projects/SMITHS/code/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu:297
Segmentation fault (core dumped)

Therefore, I thought I should install CUDA 10.1.
So that I follow the instruction (CUDA Toolkit 10.1 Original Archive | NVIDIA Developer) to install cuda 10.1 with deb (local)

Installation Instructions:
sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

However, I found that CUDA-10.2 will be installed.
So I change sudo apt-get install cuda to sudo apt-get install cuda-10.1

Although I can find that there is /usr/local/cuda-10.1, when I typed in ‘nvidia-smi’, it showed CUDA 10.2.
Then I ran the program again, and there were still problems.

Then I typed in sudo apt-get purge '*nvidia*' and ran sudo apt-get install cuda-10.1. Guess what,
I find some wrong. It showed that

sudo apt-get install cuda-10.1
Reading package lists… Done
Building dependency tree
Reading state information… Done
Note, selecting ‘cuda-10-1’ for regex ‘cuda-10.1’
The following additional packages will be installed:
accountsservice acpid apg aptdaemon avahi-daemon avahi-utils bbswitch-dkms bind9-host bluez bluez-obexd cheese-common cracklib-runtime crda
cuda-command-line-tools-10-1 cuda-compiler-10-1 cuda-cudart-10-1 cuda-cudart-dev-10-1 cuda-cufft-10-1 cuda-cufft-dev-10-1 cuda-cuobjdump-10-1
cuda-cupti-10-1 cuda-curand-10-1 cuda-curand-dev-10-1 cuda-cusolver-10-1 cuda-cusolver-dev-10-1 cuda-cusparse-10-1 cuda-cusparse-dev-10-1
cuda-demo-suite-10-1 cuda-documentation-10-1 cuda-driver-dev-10-1 cuda-drivers cuda-gdb-10-1 cuda-gpu-library-advisor-10-1 cuda-libraries-10-1
cuda-libraries-dev-10-1 cuda-license-10-1 cuda-license-10-2

cuda-memcheck-10-1 cuda-misc-headers-10-1 cuda-npp-10-1 cuda-npp-dev-10-1
cuda-nsight-10-1 cuda-nsight-compute-10-1 cuda-nsight-systems-10-1 cuda-nvcc-10-1 cuda-nvdisasm-10-1 cuda-nvgraph-10-1 cuda-nvgraph-dev-10-1
cuda-nvjpeg-10-1 cuda-nvjpeg-dev-10-1 cuda-nvml-dev-10-1 cuda-nvprof-10-1 cuda-nvprune-10-1 cuda-nvrtc-10-1 cuda-nvrtc-dev-10-1 cuda-nvtx-10-1

cuda-license-10-2 makes me very confused. I don’t know how to install the correct version cuda and run my program.

lukee2ni6 · February 26, 2021, 4:02pm

nvidia-smi just gives you driver information - so it shows you the maximum possible version of CUDA that is supported by your driver. It doesn’t actually tell you anything about your CUDA install. yes, it’s confusing.

Topic		Replies	Views
Cuda10 installing problem, nvidia-smi is not working CUDA Setup and Installation	1	4795	December 27, 2019
problem with sdk 1.1 in opensuse 10.2 CUDA Programming and Performance	2	8349	March 20, 2008
Cuda remove 10.1 and install 10.0 Ubuntu 18.04 CUDA Setup and Installation	4	38846	November 19, 2019
Installing CUDA 10.1 Update 2 in elementary OS Juno / ubuntu 18.04.1 CUDA Setup and Installation	1	1262	November 8, 2019
Upgrade the current version of CUDA CUDA Setup and Installation	5	20788	October 12, 2021
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94709	December 11, 2020
CUDA 10.1 installation on Ubuntu 18.04 does not detect installed driver version CUDA Setup and Installation	1	1306	April 30, 2019
Ubuntu 18.04 Cuda installation issue CUDA Setup and Installation	0	417	September 5, 2020
Incomplete installation CUDA 10.1 - Ubuntu 18.04 CUDA Setup and Installation	1	7256	May 14, 2019
Nvcc --version returns nothing despite correct install CUDA Setup and Installation	4	10710	October 12, 2021

Install CUDA 10.1 but nvidia-smi gets CUDA 10.2

Related topics