Use GPU with LAPACK DPBSV routine on Jetson-Nano

DamianD · August 16, 2020, 9:56am

Hi,

I am calling the DPBSV Cholesky factorization routine from LAPACK in a fortran projet. I successfully ported the project from gfortran to pgfortran and the code is compiling well (using cmake).

LAPACK CUDA libraries seem to be loaded, but when I check the GPU activity is 0 % during the execution ot my program, while all CPU cores are used!
What do I miss in the compilation flags? Is there something else to set up in my CMakeLists.txt in order to do the computation with the GPU?

Here is my CMakeLists.txt:

cmake_minimum_required(VERSION 3.15)
enable_language ( Fortran )
project(my-project)
# PGI flags
set(CMAKE_Fortran_FLAGS "-fast -O4")
find_package(CUDA)
find_package(BLAS)
find_package(LAPACK REQUIRED)
file(GLOB SRC_FILES src/*.f90)
add_executable(myexec ${SRC_FILES})
target_link_libraries(myexec lapack)

And here is the output of the build:

-- The Fortran compiler identification is PGI 20.7.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgfortran - skipped
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgfortran supports Fortran 90
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgfortran supports Fortran 90 - yes
-- The C compiler identification is PGI 20.7.0
-- The CXX compiler identification is PGI 20.7.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0 (found version "11.0") 
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - not found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Found BLAS: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib/libblas.so  
-- Looking for Fortran cheev
-- Looking for Fortran cheev - not found
-- Looking for Fortran cheev
-- Looking for Fortran cheev - found
-- A library with LAPACK API found.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/damian/projects/my-project/build

AastaLLL · August 17, 2020, 3:42am

Hi,

Are you using Jetson Nano? (since here is Nano board)

Based on your log, it looks CUDA 11.0 is installed on your environment.
However, we don’t have CUDA 11.0 release for Jetson device currently.

-- Found CUDA: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0 (found version "11.0")

Thanks.

DamianD · August 17, 2020, 8:44am

Hi,

Yes, I use Jetson Nano with the latest HPC SDK (cuda 11 ships with this SDK) as I use pgfortran.

DamianD · August 17, 2020, 8:57am

If I do not specify the HPC SDK in the PATH, I do not have blas, which is required by lapack. It seems it still looks for lapack in the HPC location…

-- The Fortran compiler identification is PGI 20.7.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgfortran - skipped
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgfortran supports Fortran 90
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgfortran supports Fortran 90 - yes
-- The C compiler identification is PGI 20.7.0
-- The CXX compiler identification is PGI 20.7.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "10.2") 
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - not found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Found BLAS: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib/libblas.so  
-- Looking for Fortran cheev
-- Looking for Fortran cheev - not found
-- Looking for Fortran cheev
-- Looking for Fortran cheev - found
-- A library with LAPACK API found.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/damian/projects/refonte-alefe/build

On execution there is no blas library found

error while loading shared libraries: libblas.so.0: cannot open shared object file: No such file or directory

AastaLLL · August 18, 2020, 4:09am

Hi,

Sorry that I didn’t notice that HPC SDK do support ARM system.
Let me check this with our internal team and give you more feedback later.

Thanks.

AastaLLL · August 18, 2020, 9:46am

Hi,

We can get BLAS and LAPACK installed via apt-get install.
After that, please run the cmake like this:

$ cmake . -DLAPACK_LIBRARIES=/usr/lib/aarch64-linux-gnu/ -DBLAS_LIBRARIES=/usr/lib/aarch64-linux-gnu/

Thanks.

DamianD · August 24, 2020, 12:03pm

Hi,
Thanks for your response. Lapack and BLAS are found, but still does not exploit the GPU!
I am wondering if the hpc sdk lapack exploit the GPU or should I use another equivalent library?

AastaLLL · August 28, 2020, 3:50am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Would you mind to share the cmake output log with us?
Our cmake can find CUDA 10.2 currectly.

Thanks.

Topic		Replies	Views
Cmake cannot find CUDAToolkit Jetson AGX Xavier cuda , ubuntu	8	8749	January 4, 2023
Jetson nano and HPC SDK Jetson Nano cuda	10	2564	October 15, 2021
Jetson Nano CPP Support Jetson Nano	12	1381	April 10, 2023
Unable to install opencv with CUDA in Jetson Nano Jetson Nano	30	13793	October 18, 2021
Installing dlib_for_arm Jetson Nano cudnn	2	742	October 18, 2021
Accelerated Fortran stdpar code failing at runtime nvc, nvc++ and nvfortran	9	62	May 19, 2025
Jetson-inference - linker error (github -dusty-nv) Jetson Nano jetson-inference	7	499	November 21, 2023
How do I install the required CUDA in order to cross compile on an X86 device for the Jetson Xavier NX device Jetson AGX Xavier cuda	14	6049	October 18, 2021
Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "10.2") Jetson Nano cuda	16	9873	August 25, 2023
netcdf building Legacy PGI Compilers	2	8173	July 8, 2008

Use GPU with LAPACK DPBSV routine on Jetson-Nano

Related topics