Can somebody please clarify what is happening when I attempt to install the latest 10.1 cuda toolkit?
It looks like, installing the cuda toolkit and not the nvidia driver, installs files in both /path/to/toolkit and /usr/lib64. On top of that some of the files are 10.2 (see cublas) and some of the files are labelled with three numbers (which tensorflow <= 1.13 does not like).
I use the binary file cuda_10.1.168_418.67_linux.run
When I install cuda 10.1 I do not get any libcublas.10.1 files, I checked both the toolkit root (/usr/local/cuda-10.1), and /usr/lib64.
In /usr/lib64 I have other cuda files for example, libcublas.so.10.2.0.168.
I don’t understand why cublas gets installed to a system location, /usr/lib64. Previously I could set the install location from the binary runfile. Which I set to be /usr/local/cuda-X.Y versions. Then all of the cuda runtime was there, and I can change them.
As a work around, I downloaded the run file, I extracted it using
./cuda_10.1.168_418.67_linux.run --extact=extracted
Then in the extracted folder there is a cuda-toolkit. I copied that to /usr/local/cuda-10.1 After that, I needed to create symbolic links for all of the .so files that didn’t have a .so.10.1 version (including the blas files that are 10.2.0.168 and 10.1.168).
That is probably tensorflow specific, as I could compile r1.14 using the new cuda drivers setup, but r1.13 (current stable version) needed the 10.1 sym links.
Thank you