Why, when given “–installpath=/opt/soft/cuda/11.5.0_495.29.05”, did the installer try to write to /usr/share/applications?
Why, when running as root with “–installpath=/opt/soft/cuda/11.5.0_495.29.05”, by root, who owns /opt and /opt/soft, both mode 0755, did the installer claim to have no write access to either location?
How, if not --installpath, do I control the install path? (Hint: it’s not any of the other “–*path” options either.)
Why does the installer probe for drivers when explicitly not told to install any drivers?
Environment:
CentOS 7
kernel 3.10.0-1160.45.1.el7.x86_64
full source build of gcc 11.2.0 in /opt/soft
cuda toolkit 11.5.0_495.29.05
When told explicitly to install everything in a specific place?
This is a bug. Actually it’s several bugs, because in addition to desktop files there are manpages and pkg-config files, all of which it tries to write to “somewhere-other-than-where-you-told-it-to-put-everything”. (The pkgconf files are particularly troublesome since they don’t get written into {installpath}/lib[64]/pkgconfig and get deleted when the installer finishes.) Per nvidia install procedure the --installpath option exists explicitly to relocate “everything”, it should do so. If it can’t write something, it shouldn’t keep going if that something is relatively low priority like a man page or desktop link, it should fail because it’s supposed to have write access and doesn’t. Assuming it’s trying to write where it’s been told to, that is…
BTY, something I noticed while stacktracing the installer to see what it was actually doing prior to those writes: the tmpdir directive doesn’t look like it gets sanitized. There’s no check for an absolute path there, but a relative path crashes the install partway through with the errorlog ending “{arg provided via --tmp}: no such file” while an absolute tmpdir runs fine. If this needs to be absolute then a relative path should be rejected, otherwise it should be sanitized/expanded (usually via realpath since just about everything implements something like it).
ABTY, similar problem with the installpath, I discovered the problem with the initial failure was because it was being pre-generated by higher-level wrapper scripting, which causes the installer to throw the “no permission” error. According to the stacktrace this is because the installer does not examine the return code from the attempt to create the last node in the installpath. This causes “failure because the target already exists” to be interpreted as “failure because the target could not be created”, then reported as “failure because permission is denied”. This may or may not be a bug, most forms of UN*X I’ve seen in the last few decades implement some form of “install” that permits target dir pre-creation so long as the final target node can still be written. But it’s never been a strongly uniform thing, so … maybe this just needs a better error message, or a preflight check for exists(installpath)?
When I look at the “Advanced Options” silent installation section of the Cuda 11.5.0 Installation Guide Linux, I don’t see an --installpath option. There’s --toolkitpath=, --samplespath= and --defaultroot=
Look at the installer. Or look at the log I already posted, which shows the option being used to override the other *path options. You can go back several versions of the installer and see the same thing. That web page is not up to date with the installer itself. Understandable because the installer functionality is obviously higher priority than continuity with the static web content.
See the output log of a working run, it includes “[INFO]: Setting globalpath=…” when the only argument with that or any path is --installpath. Here’s another example:
[root@login00 005_cudatoolkit]# md5sum cuda_11.5.0_495.29.05_linux.run
888a4538c0d12a8be06279bbc6e3e9b0 cuda_11.5.0_495.29.05_linux.run
[root@login00 005_cudatoolkit]# ./cuda_11.5.0_495.29.05_linux.run --help
Options:
...
--librarypath=<path>
Install libraries to the <path> directory. If this flag is not provided,
the default path of your distribution is used. This flag only applies to
libraries installed outside of the CUDA Toolkit path.
--installpath=<path>
Install everything to the <path> directory. This flag sets the same values
as the toolkitpath, samplespath, and librarypath options.
--extract=<path>
Extracts driver runfile and the raw files of the toolkit and samples to
<path>.
...
[root@login00 005_cudatoolkit]# tmpdir="$(mktemp -d)"
[root@login00 005_cudatoolkit]# mount -t tmpfs -o size=20G tmpfs $tmpdir
[root@login00 005_cudatoolkit]# ./cuda_11.5.0_495.29.05_linux.run --silent \
--toolkit --samples --installpath=/opt/soft/cuda/11.5.0_495.29.05 --override \
--tmpdir=$tmpdir
[root@login00 005_cudatoolkit]# echo $?
0
[root@login00 005_cudatoolkit]# umount $tmpdir
[root@login00 005_cudatoolkit]# rmdir $tmpdir
[root@login00 005_cudatoolkit]# head 05 /var/log/cuda-installer.log
[INFO]: Setting silent=true
[INFO]: Setting toolkit=true
[INFO]: Setting samples=true
[INFO]: Setting globalpath=/opt/soft/cuda/11.5.0_495.29.05
[INFO]: Overriding compiler check
How much sense does it make to support relocation of the libraries but not the pkgconfig directory inside the libraries location? Or to support relocation of the samples but not the man pages?