cuda 10.1 install path

bernard.at.spark · February 27, 2019, 6:37am

The cuda 10.1 .run file installer has a new interface. I am wondering how to change the install path to somewhere in my $HOME instead of /usr/local

I choose options > Root install path and enter a local directory, then Done, But when I press Install the installer complains that it doesn’t have permission to install in /usr/local and quits.

OS is Ubuntu 16.04

chemal · March 1, 2019, 2:14am

The new installer is completely broken. Look at the log file to see what it really tries to do. Don’t run as root or it will mess up your system.

In theory you type something like this:

$ sh cuda_10.1.105_418.39_linux.run --silent --toolkit --toolkitpath=$HOME/tkit

But this doesn’t work, because the new installer unconditionally tries to put cublas in the /usr hierarchy and quits with an error midway when /usr isn’t writebale and leaves you with an incomplete installation in toolkitpath.

If you fix this manually, you will find other surprises. 10.1 doesn’t work with gcc 7 (10.0 did) and it also doesn’t work with gcc 8 (the docs say both are supported). (I tested this on RHEL not Ubuntu).

AndyDick · March 1, 2019, 7:31pm

At a high level, the runfile installer tries to install three types of files:

Toolkit files in the toolkit directory
Toolkit files which have been moved outside of the toolkit directory
Non-toolkit files that are useful for toolkit use (.pc files, /usr/local/cuda symlink, etc)

The files in #1 can be configured via --toolkitpath=<path> on the command line, or via the toolkit advanced options menu in the GUI by selecting “CUDA Toolkit 10.1” and hitting the ‘a’ button. (Similar for the “CUDA Samples 10.1” selection).
The files in #2 can be configured via --defaultroot=<path> on the command line, or via the options menu in the GUI at the bottom, below the ‘Install’ selection.
Some of the files in #3 can be toggled on/off via the “CUDA Toolkit 10.1” advanced options, but shouldn’t fail the install if not running the installer with root permissions.

Some of these options aren’t obvious or well-documented, and will be improved upon down the road as soon as we can.

chemal · March 2, 2019, 2:36am

Thanks for the reply. I verified that

sh cuda_10.1.105_418.39_linux.run --silent --toolkit --toolkitpath=$HOME/tkit --defaultroot=$HOME/tkit --samples --samplespath=$HOME/tkit/samples

gives back the old layout. By the way: what was wrong with it?

More problems:

How to get the pkg-config files back?
What to do with the old-style rhel6 installer? It doesn’t understand ‘–defaultroot’ but now also insists on putting cublas in /usr.
10.1 produces code that both gcc 7 & 8 don’t compile (at least on RHEL 6 & 7, most probably elsewhere). Is this already known? Should it be reported? Where?

bernard.at.spark · March 3, 2019, 6:42am

@AndyDick

So I followed your explanation and set my install paths accordingly : cudatoolkit in $HOME/opt/cuda_test/cuda, samples in $HOME/opt/cuda_test, default root directory (for files in #2 of your post) in $HOME/opt/cuda_test/cuda

I get

Completed with errors. See log at /tmp/cuda-installer.log for details.

The log is long and uninformative, there are a lot of warnings for not being able to write to /var/log
the only error I could spot is

[ERROR]: cuda.conf wasn't handled correctly during upgrade

Trying to compile samples get some errors (gcc 5.4.0)

simpleCudaGraphs.cu(286): error: identifier "cudaStreamCaptureModeGlobal" is undefined

simpleCudaGraphs.cu(286): error: too many arguments in function call

2 errors detected in the compilation of "/tmp/tmpxft_00002c32_00000000-14_simpleCudaGraphs.compute_75.cpp1.ii".
Makefile:288: recipe for target 'simpleCudaGraphs.o' failed
make[1]: *** [simpleCudaGraphs.o] Error 1

Only a handful of samples are compiled.

Tried compiled the samples again with gcc 8.2 and it errored out saying gcc > 7 is not supported

P.S. Agreed 100 % with chemal, I am not sure what is to be gain with the new installer which is less intuitive, and possibly buggy. The job of the installer is to install the software, the old one does the job.Why add another layer of explanation and frustration and probably bug fixing??

bernard.at.spark · March 4, 2019, 1:24pm

Turns out the bug is in simpleCudaGraphs. Removing it from samples the rest built successfully.

I installed cuda with

./cuda_10.1.105_418.39_linux.run --silent --toolkit --toolkitpath=$HOME/opt/cuda_test/cuda --defaultroot=$HOME/opt/cuda_test/cuda

no more error message since I suppose it is suppressed by —silent, but log file still shows

[ERROR]: cuda.conf wasn’t handled correctly during upgrade

Probably not important.

thomas.orgis · May 10, 2019, 2:27pm

I failed to find an entry to fix the installation of pkg-config files myself … buried too deep in the installer.

Can we hope for a complete install coming back with the next release of CUDA? Along with the various problems problems people seem to have with gcc 7 and 8, I guess my HPC installation has to go on without this iteration of the CUDA toolkit for a while. We’re using 9.0 with gcc 6.4.0 on CentOS 7.6 and users seem happy (or at least silent).