Stuck in Post-installation Actions/13.1.2. POWER9 Setup

Hi, I am trying to install the latest CUDA version in the WSL2 on my notebook (NVIDIA GeForce RTX 3060 6GB) to use it for GROMACS. I started by purging all of the NVIDIA related files and followed the installation guide:

During the process I encountered multiple errors or did not really know what to do next, but so far I was more or less able to solve everything and continue. Some of the issues:

1 The command lspci | grep -i nvidia
did not work - probably due to WSL?

2 Install GPUDirect Storage:
I have no clue, if this is important for me. It was to complex to understand and seemed to be rather tedious to install.

3 The command nvcc yields

Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

which seems strange to me, since this is what I just installed, right? So overall, where is the difference between the complicated setup shown on the website and the single command
sudo apt install nvidia...?

4 The command systemctl status nvidia-persistenced
yielded

system has not been booted with systemd as init system (pid 1). can't operate. Failed to connect to bus: Host is down.

After I activated systemd, I received another error message, which is my current problem:

Unit nvidia-persistenced.service could not be found.

Unfortunately, I cannot find a lot of information concerning this problem.

For installation I used the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.3.0/local_installers/cuda-repo-wsl-ubuntu-12-3-local_12.3.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-3-local_12.3.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

I also exported the path into the $PATH variable in my .profile file:

if [ -d "/usr/local/cuda-12.2" ] ; then
    export CUDA_HOME=/usr/local/cuda-12.2
    export PATH=${CUDA_HOME}/bin:${PATH:+:${PATH}}
fi

The command

nvidia-smi

yields:

Tue Oct 31 23:26:24 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01              Driver Version: 546.01       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   42C    P8              10W /  65W |     12MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       408      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

I would be really glad, if someone could help me with this. Probably all the errors root in the same underlying problem. Unfortunately I am not an export in this field and I cannot solve these problems on my own. Big thanks in advance!

You set the path to /usr/local/cuda-12.2 but (correctly) installed cuda 12.3 and nvcc should be in subdirectory bin so it has to be set to /usr/local/cuda-12.3/bin
Please check if that path exists and contains nvcc

GPUDirect Storage and nvidia-persistenced are only for real hardware, not WSL2.

Thank you very much. This was very helpful so far. However I still have a bunch of (new) questions and problems concerning the post installation steps:

You said that nvidia-persistenced is only for real hardware, not for WSL2. What exactly do you mean by that? You say so, because I don’t have “real” Linux as my OS?
Does that mean that I can skip all the steps listed under 13.1.2. POWER9 Setup? Which post installation steps are mandatory for WSL2 users?

Anyways, I followed the steps and did comment out the udev rule as stated under 2). Was this even necessary for my case?

13.2.1. Install Persistence Daemon can be skipped, I guess, as I cannot/don’t have to use persistenced, right?

Also I cannot verify that the driver is loaded, the corresponding command:

cat /proc/driver/nvidia/version

fails, since the /nvidia subdirectory doesn’t exists.

Next, I performed the samples found in GitHub - NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit Compilation worked so far giving me only a few warnings about some applications being not supported on Linux x86_64 or that no MPI compiler was found. Idk if this is problematic. However during compilation of the 4_CUDA_Libraries it failed with:

make[1]: *** No rule to make target 'main.cu', needed by 'main.o'.  Stop.
make[1]: Leaving directory '/mnt/c/Users/rscho/Documents/GitHub/cuda-samples/Samples/4_CUDA_Libraries/cuDLALayerwiseStatsHybrid'
make: *** [Makefile:45: Samples/4_CUDA_Libraries/cuDLALayerwiseStatsHybrid/Makefile.ph_build] Error 2

I installed some of the required dependecies and did a rerun using the -k flag.
At least, I was able to run the deviceQuery and bandwidthTest successfully.

I also could not install the Install Nsight Eclipse Plugins, because in the indicated directory there was no nsight_ee_plugins_manage.sh file, only a zip file, which I unzipped, but still no .sh in sight…

The command sudo apt-get remove --purge “cuda-repo--X-Y-local*” also did not work, telling me “ignoring file … as it has an invalid filename extension”, so i deleted it manually.

I hope you can help me with these problems as well. Unfortunately, I seems like I have to be an IT expert to do this on my own. Thanks, anyway!

“real” hardware, i.e. gpu, by either installing Linux bare-metal or in a VM with a passed-through nvidia gpu. WSL2 works differently, it doesn’t pass through a gpu but provides a kind of “proxy” to pass over cuda instructions from the Linux environment to the nvidia driver running on Windows. So you mustn’t install the driver parts in WSL2, nvidia-persistenced being part of it. So it is expected that

fails.

IBM POWER9 is a completely different cpu/system architecture, nothing should be applied to x86/WSL2. Please revert it

. For WSL2, just follow the docs for it:
https://docs.nvidia.com/cuda/wsl-user-guide/index.html

The samples where compile is failing are simply broken, the code is missing:
https://github.com/NVIDIA/cuda-samples/issues/235

Running apt-get remove --purge “cuda-repo--X-Y-local*” is error-prone since it takes local files in the current working directory into account, e.g. you have a file there “cuda-repo–X-Y-local.txt” then apt tries to uninstall a package with .txt extensions, failing of course. When using apt with * always make sure your in an empty directory.

The nsight eclipse plugins are likely in another repo package, the (missing) .sh file only relevant when having installed cuda using the runfile installer.