Problems(and solutions) installing HPC 12.6 over Kali

WSL2 is is the most important win11 app that i use, for Sage, Pari, Singular and mixing with
Numba/Cupy/Jax and soon more dask_cudf.

Recently, I decided to update my nvhpc to the new one, Also, now, working over Kali WSL2.
I wanted to see how it goes with Nsight/System/Compute.

NVIDIA HPC SDK Current Release Downloads | NVIDIA Developer

The summary, it is not a big secret to say WSL2 has serious RAM management problems.
I have at least one crash/day.

1)Installing Problems
2)Problems with Numba
3) Problems with cupy.
ok. one by one

a- wget https://developer.download.nvidia.com/hpc-sdk/24.9/nvhpc_2024_249_Linux_x86_64_cuda_12.6.tar.gz
b- tar xpzf nvhpc_2024_249_Linux_x86_64_cuda_12.6.tar.gz
c- nvhpc_2024_249_Linux_x86_64_cuda_12.6/install

Then boom crashes. WSL2 have ram problems. It uses ram up to the upper limit and instead
of abort, it crashes. But it is annoying even with copy just copy some files. What is the Problem?

It turned out that the install copied already many files before it crashes.

I use ncdu to compare what already copied.
The first time i restart but the same crash.
So, I will not download 12G for nothing. I used to ncdu to compare what copied and what remained
Then here the simple trick

DELETE some files from the untared dir (that you created in b using tar xpzf …)
for example delete
drwxr-xr-x 6 mabd mabd 4096 Oct 10 18:46 comm_libs
drwxr-xr-x 15 mabd mabd 4096 Sep 24 02:44 compilers
drwxr-xr-x 3 mabd mabd 4096 Oct 10 18:46 cuda
if they are already copied and then launch the install again for the remaining
drwxr-xr-x 11 mabd mabd 4096 Oct 10 20:35 examples
drwx------ 3 root root 4096 Oct 10 18:46 math_libs
drwxr-xr-x 4 mabd mabd 4096 Sep 24 02:38 profilers
drwxr-xr-x 6 mabd mabd 4096 Sep 24 02:43 REDIST

It will copy the rest and make for you the needed symbolic links and then give you the nice


generating environment modules for NV HPC SDK 24.9 … done.
Installation complete.
HPC SDK successfully installed into /opt/nvidia/hpc_sdk

If you use the Environment Modules package, that is, the module load
command, the NVIDIA HPC SDK includes a script to set up the
appropriate module files.

% module load /opt/nvidia/hpc_sdk/modulefiles/nvhpc/24.9
% module load nvhpc/24.9

Alternatively, the shell environment may be initialized to use the HPC SDK.

In csh, use these commands:

% set path = (/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin $path)
% setenv MANPATH /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/man:“$MANPATH”

To use MPI, also set:

% set path = (/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/mpi/bin $path)

In bash, sh, or ksh, use these commands:

$ export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin:$PATH
$ export MANPATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/man:$MANPATH

To use MPI, also set:

$ export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/comm_libs/mpi/bin:$PATH

Please check https://developer.nvidia.com for documentation,
use of NVIDIA HPC SDK software, and other questions.


very good

over Kali there no modules just google for installing
[install environment module package]
it is two step install lua and then install lmod
once done you are ready to test your installation

To make your life easy, copy the module file 24.9 from
/opt/nvidia/hpc_sdk/modulefiles/nvhpc/
to
/usr/share/modules/modulefiles
then to activate your nvhpc
just type" module load 24.9
now
nv and tab should give you a lot nv*

nvaccelerror nvcpl.dll nvdlisrwrapper.exe nvlink.exe nvprof nvunzip
nvaccelinfo nvcpuid nvEncodeAPI64.dll nvml.dll nvprof.exe nvvp
nvapi64.dll nvcudadebugger.dll nvextract nvngx_dlisr.dll nvprune nvvp.bat
nvaudcap64v.dll nvcuda.dll nvfortran nv-nsight-cu-cli nvprune.exe nvvp.exe
nvblas64_11.dll nvcudainit nvidia-pcc.exe nvofapi64.dll nvrtc64_112_0.dll nvvp.ini
nvc nvcuvid.dll nvidia-smi nvperf_host.dll nvrtc-builtins64_114.dll nvzip
nvc++ nvdebugdump.exe nvidia-smi.exe nvperf_host.lib nvrtc-builtins64_117.dll
nvcc nvdecode nvinfo.pb nvperf_target.dll nvsize
nvcc.exe nvdisasm nvjpeg64_11.dll nvperf_target.lib nvspcap64.dll
nvcc.profile nvdisasm.exe nvlink nvprepro nvspinfo.exe

from win11 and your kali

type ─$ nvaccelinfo

CUDA Driver Version: 12070

Device Number: 0
Device Name: NVIDIA GeForce GTX 1660 Ti with Max-Q Design
Device Revision Number: 7.5
Global Memory Size: 6442123264
Number of Multiprocessors: 24
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1335 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 6001 MHz
Memory Bus Width: 192 bits
L2 Cache Size: 1572864 bytes
Max Threads Per SMP: 1024
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: No
Preemption Supported: Yes
Cooperative Launch: Yes
Unified Memory: No
Memory Models Flags: -gpu=mem:separate
Default Target: cc75

good, try some

nvcc -V
nvc -V
nvc++ -V

everything should be ok.

now for numba

try numba -sit gives you
CUDA Libraries Test Output:
not fount

???

I digged a little bit inside numba up to
./.local/lib/python3.11/site-packages/numba/cuda/cudadrv/libs.py

Now, after little reading, life is easy. All what is needed

export CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/cuda

now for
3) cupy just be sure to

module load 24.9
before using cupy

also you might add cuda to your path

export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin:$PATH

in summary after the install it is just path problem, These three lines can save your time
module load 24.9
export CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/cuda
export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/24.9/compilers/bin:$PATH

Working now over kali
type kex
but kali/terminal does not recognize that you have installed module
type
echo $SHELL
be sure it bash

then use

source /usr/local/lmod/lmod/init/bash

now module load 24.9
should work and you can work with nvhpc from inside kali/kex

nsight/compute/system launches but i have not test through them.

Good luck

Thank you nvidia for all the hard work. We love you.

It seems that cupy loves only cuda tools

import cupy as cp
cp.random.seed()
Traceback (most recent call last):
File “”, line 1, in
File “/home/mabd/.local/lib/python3.11/site-packages/cupy/random/_generator.py”, line 1280, in seed
get_random_state().seed(seed)
^^^^^^^^^^^^^^^^^^
File “/home/mabd/.local/lib/python3.11/site-packages/cupy/random/_generator.py”, line 1312, in get_random_state
rs = RandomState(seed)
^^^^^^^^^^^^^^^^^
File “/home/mabd/.local/lib/python3.11/site-packages/cupy/random/_generator.py”, line 56, in init
from cupy_backends.cuda.libs import curand
ImportError: libcurand.so.10: cannot open shared object file: No such file or directory
exit()
Exception ignored in: <function RandomState.del at 0x7c44397285e0>
Traceback (most recent call last):
File “/home/mabd/.local/lib/python3.11/site-packages/cupy/random/_generator.py”, line 65, in del
from cupy_backends.cuda.libs import curand
ImportError: libcurand.so.10: cannot open shared object file: No such file or directory

now to work around

go /usr/local
mkdir cuda
cd cuda
cp /opt/nvidia/hpc_sdk/Linux_x86_64/24.9/math_libs/12.6/lib64/libcurand.so.10 .

this should make cp.random works. But what about the rest of cupy cuda
ok:

┌──(mabd㉿LAPTOP-T8DQ9UK0)-[~/.local/lib/python3.11/site-packages]
└─$ ldd $(find cupy cupyx -name ‘*.so’) | awk ‘{print $1}’ | sort | uniq | grep lib
cupy/lib/_polynomial.cpython-311-x86_64-linux-gnu.so:
/lib64/ld-linux-x86-64.so.2
libc.so.6
libcublasLt.so.12
libcublas.so.12
libcufft.so.11
libcusolver.so.11
libcusparse.so.12
libdl.so.2
libgcc_s.so.1
libm.so.6
libnvJitLink.so.12
libnvrtc.so.12
libpthread.so.0
librt.so.1
libstdc++.so.6

now just copy these libcu… to your /usr/local/cuda

and now cupy works with nvhpc over Kali. I think it should be the same for ubuntu22/24

Good luck.

Now I can test JAX vs cupy
so far jitted JAX is faster than cupy but i need to do more tests.