I am new to Ubuntu, so sorry for any naive question.
I am trying to install cuda-7.5 on my GPU server machine. Before installation, I removed all cuda and nvidia drivers using:
sudo apt-get remove --purge nvidia-*
sudo apt-get remove nvidia-cuda-toolkit
sudo apt-get remove --auto-remove nvidia-cuda-toolkit
sudo nvidia-uninstall
Then, I followed following guidelines to install cuda-7.5:
Confirmation of the environment –
lspci | grep -i nvidia (Confirm that the information of NVIDIA's board is displayed)
uname -m (make sure that it is a x86_64)
gcc --version (make sure it is installed)
Installation of CUDA –
Downloaded cuda_7.5.18_linux.run file from CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer and ran the following command –
sudo apt-get install build-essential
sudo vi /etc/modprobe.d/blacklist-nouveau.conf
Then, add the following line in that file: blacklist nouveau option nouveau modeset=0
sudo update-initramfs -u
Rebooted server using init 6
And, installed cuda using
chmod a+x cuda-7.5.18_linux.run
sudo service lightdm stop
sudo bash cuda-7.5.18_linux.run --no-opengl-libs
During the install –
a. Accepted EULA conditions
b. YES to installing the NVIDIA driver
c. YES to installing CUDA Toolkit + Driver
d. YES to installing CUDA Samples
After installation, I checked following things:
cd /dev (contains nvidiactl and nvidia0 files)
Set Environment path variables –
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
Then,
nvidia-smi (show driver is running with verion 352.39)
Verified the driver version –
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.39 Fri Aug 14 18:09:10 PDT 2015 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
Verified cuda version
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
Switch the lightdm back on again
sudo service lightdm start
Create CUDA Samples –
a. Go to NVIDIA_CUDA-7.5_Samples folder through terminal
b. make
c. cd bin/x86_64/linux/release/
Now when I run ./devideQuery, I get following errors:
./deviceQuery Starting…
CUDA Device Query (Runtime API) version (CUDART static linking)
modprobe: ERROR: …/libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name=‘nvidia_367_uvm’
modprobe: ERROR: could not insert ‘nvidia_367_uvm’: Function not implemented
cudaGetDeviceCount returned 30
→ unknown error
Result = FAIL
I am not able to understand why modeprobe is looking for nvidia_367_uvm as I have installed nvidia 352.39 driver.
Please help me to solve this problem. To me it seems like there is some earlier instance of nvidia 367 that is causing the problem. Any help please!!