GPU is not working withn nvfortran

I have a RTX 3080 display card. I installed the nvhpc-22.5 in Centos 8 stream. I am testing it with the saxpy.f90. It compiles fine with no errors, but when I run it give me errors. see below. I have no clues as what to do next. Any suggestion is appreciated. Thank you!

$ nvfortran -stdpar=gpu -Minfo saxpy.f90 -o saxpy -cudaforlibs
saxpy_concurrent:
26, Generating NVIDIA GPU code
26, Loop parallelized across CUDA thread blocks, CUDA threads(128) blockidx%x threadidx%x
saxpy_do:
36, FMA (fused multiply-add) instruction(s) generated
$ ./saxpy
Current file: /home/jluo/saxpy.f90
function: saxpy_concurrent
line: 26
This file was compiled: -acc=gpu -gpu=cc35 -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -

Hi Jluo1,

This usually means that the GPU is not accessible for some reason.

What CUDA driver do you have installed?
What does the output from running the “nvaccelinfo” utility show?

-Mat

Hi Matt, You are right. It can not find the GPU. I have cuda 11.7. Also, I have set the LD_LIBRARY_PATH. See below. Thank you!

$ nvaccelinfo -v
libcuda.so not found
No accelerators found.
Check that you have installed the CUDA driver properly
Check that your LD_LIBRARY_PATH environment variable points to the CUDA runtime installation directory
$ echo $LD_LIBRARY_PATH
/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/targets/x86_64-linux/lib/stabs:/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/REDIST/cuda/11.7/targets/x86_64-linux/lib/stabs

While I don’t know for sure, I’m guessing you have Nouveau graphics driver installed, which doesn’t support CUDA. The CUDA driver (i.e. libcuda.so) is not included with the NV HPC SDK and must be installed separately (this is why setting the LD_LIBRARY_PATH isn’t working).

Please go to: Official Drivers | NVIDIA

Hope this helps,
Mat

Hi Mat,

I have re-installed the CentOS 8 Stream and the nvidia driver. It is working now. Thank you for pointing me to the right direction! See below:

[jluo@localhost jacobi]$ nvfortran -stdpar=gpu jacobi2.f90 -Minfo
smooth:
27, Generating NVIDIA GPU code
27, ! blockidx%x threadidx%x auto-collapsed
Loop parallelized across CUDA thread blocks, CUDA threads(128) collapse(2) ! blockidx%x threadidx%x collapsed-innermost
32, Generating NVIDIA GPU code
32, ! blockidx%x threadidx%x auto-collapsed
Loop parallelized across CUDA thread blocks, CUDA threads(128) collapse(2) ! blockidx%x threadidx%x collapsed-innermost
smoothhost:
48, FMA (fused multiply-add) instruction(s) generated
55, FMA (fused multiply-add) instruction(s) generated
[jluo@localhost jacobi]$ ./a.out
4930 microseconds on parallel with do concurrent
312228 microseconds on sequential
parallel is 63 times faster than sequential
Test PASSED

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.