Cudatoolkit 11.5.0 installer: "no write access" to self-owned directories, writes to /usr/local/applications with --installpath=/opt/soft

Questions:

  1. Why, when given “–installpath=/opt/soft/cuda/11.5.0_495.29.05”, did the installer try to write to /usr/share/applications?
  2. Why, when running as root with “–installpath=/opt/soft/cuda/11.5.0_495.29.05”, by root, who owns /opt and /opt/soft, both mode 0755, did the installer claim to have no write access to either location?
  3. How, if not --installpath, do I control the install path? (Hint: it’s not any of the other “–*path” options either.)
  4. Why does the installer probe for drivers when explicitly not told to install any drivers?

Environment:
CentOS 7
kernel 3.10.0-1160.45.1.el7.x86_64
full source build of gcc 11.2.0 in /opt/soft
cuda toolkit 11.5.0_495.29.05

Steps taken:
wget https://developer.download.nvidia.com/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
wget https://developer.download.nvidia.com/compute/cuda/11.5.0/docs/sidebar/md5sum.txt -O cuda_11.5.0_495.29.05_md5sums.txt
umask 022
export PATH="/opt/soft/bin:$PATH"
export PYTHONPATH=/opt/soft/lib/python:$PYTHONPATH
export LD_LIBRARY_PATH="/opt/soft/libexec/gcc/x86_64-pc-linux-gnu/11.2.0:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/soft/lib/gcc/x86_64-pc-linux-gnu/11.2.0/plugin:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/soft/lib/gcc/x86_64-pc-linux-gnu/11.2.0:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/soft/lib64:$LD_LIBRARY_PATH"
chmod 0700 cuda_11.5.0_495.29.05_linux.run
mkdir tmp
./cuda_11.5.0_495.29.05_linux.run --silent --toolkit --samples --installpath=/opt/soft/cuda/11.5.0_495.29.05 --override --tmpdir="$(pwd)/tmp"

Resulting /var/log/cuda-installer.log:
[INFO]: Setting silent=true
[INFO]: Setting toolkit=true
[INFO]: Setting samples=true
[INFO]: Setting globalpath=/opt/soft
[INFO]: Overriding compiler check
[INFO]: Driver installation detected by command: yum list installed | grep -e xorg-x11-drv-nvidia -e “nvidia-driver.”
[INFO]: Initializing menu
[INFO]: Silent install option: skipping driver
[INFO]: Updating toolkitpath for: CUDA Toolkit 11.5
[WARNING]: Unable to write to directory: /usr/share/applications/
[INFO]: Updating toolkitpath for: CUDA Libraries 11.5
[INFO]: Updating toolkitpath for: CUDA Runtime 11.5
[INFO]: Updating toolkitpath for: cuda-cudart
[INFO]: Updating toolkitpath for: cuda-nvrtc
[INFO]: Updating toolkitpath for: libcublas11
[INFO]: Updating toolkitpath for: libcufft
[INFO]: Updating toolkitpath for: libcurand
[INFO]: Updating toolkitpath for: libcusolver
[INFO]: Updating toolkitpath for: libcusparse
[INFO]: Updating toolkitpath for: libnpp
[INFO]: Updating toolkitpath for: libnvjpeg
[INFO]: Updating toolkitpath for: CUDA Development 11.5
[INFO]: Updating toolkitpath for: cuda-cudart-dev
[INFO]: Updating toolkitpath for: cuda-driver-dev
[INFO]: Updating toolkitpath for: cuda-nvml-dev
[INFO]: Updating toolkitpath for: cuda-nvrtc-dev
[INFO]: Updating toolkitpath for: cuda-cccl
[INFO]: Updating toolkitpath for: libcublas-dev
[INFO]: Updating toolkitpath for: libcufft-dev
[INFO]: Updating toolkitpath for: libcurand-dev
[INFO]: Updating toolkitpath for: libcusolver-dev
[INFO]: Updating toolkitpath for: libcusparse-dev
[INFO]: Updating toolkitpath for: libnpp-dev
[INFO]: Updating toolkitpath for: libnvjpeg-dev
[INFO]: Components to install:
[INFO]: CUDA Toolkit 11.5
[WARNING]: Unable to write to directory: /opt
[ERROR]: Permission denied. Unable to write to /opt/soft/
[ERROR]: Install of CUDA Toolkit 11.5 failed, quitting

Directories with “no write permission”:
[root@login00 005_cudatoolkit]# ls -ld /opt /opt/soft
drwxr-xr-x. 6 root root 59 Nov 4 19:21 /opt
drwxr-xr-x. 11 root root 117 Nov 4 13:58 /opt/soft

Execution account:
[root@login00 log]# id
uid=0(root) gid=0(root) groups=0(root) context=unconfined_u:unconfined_r:unconfined_t:s0

Looking at one of my installs, it seems it writes a few *.desktop files, for Nsight apps etc., I assume for the Gnome desktop.

When told explicitly to install everything in a specific place?

This is a bug. Actually it’s several bugs, because in addition to desktop files there are manpages and pkg-config files, all of which it tries to write to “somewhere-other-than-where-you-told-it-to-put-everything”. (The pkgconf files are particularly troublesome since they don’t get written into {installpath}/lib[64]/pkgconfig and get deleted when the installer finishes.) Per nvidia install procedure the --installpath option exists explicitly to relocate “everything”, it should do so. If it can’t write something, it shouldn’t keep going if that something is relatively low priority like a man page or desktop link, it should fail because it’s supposed to have write access and doesn’t. Assuming it’s trying to write where it’s been told to, that is…

BTY, something I noticed while stacktracing the installer to see what it was actually doing prior to those writes: the tmpdir directive doesn’t look like it gets sanitized. There’s no check for an absolute path there, but a relative path crashes the install partway through with the errorlog ending “{arg provided via --tmp}: no such file” while an absolute tmpdir runs fine. If this needs to be absolute then a relative path should be rejected, otherwise it should be sanitized/expanded (usually via realpath since just about everything implements something like it).

ABTY, similar problem with the installpath, I discovered the problem with the initial failure was because it was being pre-generated by higher-level wrapper scripting, which causes the installer to throw the “no permission” error. According to the stacktrace this is because the installer does not examine the return code from the attempt to create the last node in the installpath. This causes “failure because the target already exists” to be interpreted as “failure because the target could not be created”, then reported as “failure because permission is denied”. This may or may not be a bug, most forms of UN*X I’ve seen in the last few decades implement some form of “install” that permits target dir pre-creation so long as the final target node can still be written. But it’s never been a strongly uniform thing, so … maybe this just needs a better error message, or a preflight check for exists(installpath)?

When I look at the “Advanced Options” silent installation section of the Cuda 11.5.0 Installation Guide Linux, I don’t see an --installpath option. There’s --toolkitpath=, --samplespath= and --defaultroot=

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-advanced

Look at the installer. Or look at the log I already posted, which shows the option being used to override the other *path options. You can go back several versions of the installer and see the same thing. That web page is not up to date with the installer itself. Understandable because the installer functionality is obviously higher priority than continuity with the static web content.

See the output log of a working run, it includes “[INFO]: Setting globalpath=…” when the only argument with that or any path is --installpath. Here’s another example:

[root@login00 005_cudatoolkit]# md5sum cuda_11.5.0_495.29.05_linux.run 
888a4538c0d12a8be06279bbc6e3e9b0  cuda_11.5.0_495.29.05_linux.run

[root@login00 005_cudatoolkit]# ./cuda_11.5.0_495.29.05_linux.run --help
Options:

...
  --librarypath=<path>
    Install libraries to the <path> directory. If this flag is not provided,
    the default path of your distribution is used. This flag only applies to
    libraries installed outside of the CUDA Toolkit path.

  --installpath=<path>
    Install everything to the <path> directory. This flag sets the same values
    as the toolkitpath, samplespath, and librarypath options.

  --extract=<path>
    Extracts driver runfile and the raw files of the toolkit and samples to 
    <path>.
...

[root@login00 005_cudatoolkit]# tmpdir="$(mktemp -d)"
[root@login00 005_cudatoolkit]# mount -t tmpfs -o size=20G tmpfs $tmpdir
[root@login00 005_cudatoolkit]# ./cuda_11.5.0_495.29.05_linux.run --silent \
    --toolkit --samples --installpath=/opt/soft/cuda/11.5.0_495.29.05 --override \
    --tmpdir=$tmpdir
[root@login00 005_cudatoolkit]# echo $?
0
[root@login00 005_cudatoolkit]# umount $tmpdir
[root@login00 005_cudatoolkit]# rmdir $tmpdir

[root@login00 005_cudatoolkit]# head 05 /var/log/cuda-installer.log
[INFO]: Setting silent=true
[INFO]: Setting toolkit=true
[INFO]: Setting samples=true
[INFO]: Setting globalpath=/opt/soft/cuda/11.5.0_495.29.05
[INFO]: Overriding compiler check

How much sense does it make to support relocation of the libraries but not the pkgconfig directory inside the libraries location? Or to support relocation of the samples but not the man pages?