Use GPU with LAPACK DPBSV routine on Jetson-Nano

Hi,

I am calling the DPBSV Cholesky factorization routine from LAPACK in a fortran projet. I successfully ported the project from gfortran to pgfortran and the code is compiling well (using cmake).

LAPACK CUDA libraries seem to be loaded, but when I check the GPU activity is 0 % during the execution ot my program, while all CPU cores are used!
What do I miss in the compilation flags? Is there something else to set up in my CMakeLists.txt in order to do the computation with the GPU?

Here is my CMakeLists.txt:

cmake_minimum_required(VERSION 3.15)
enable_language ( Fortran )
project(my-project)
# PGI flags
set(CMAKE_Fortran_FLAGS "-fast -O4")
find_package(CUDA)
find_package(BLAS)
find_package(LAPACK REQUIRED)
file(GLOB SRC_FILES src/*.f90)
add_executable(myexec ${SRC_FILES})
target_link_libraries(myexec lapack)

And here is the output of the build:

-- The Fortran compiler identification is PGI 20.7.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgfortran - skipped
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgfortran supports Fortran 90
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgfortran supports Fortran 90 - yes
-- The C compiler identification is PGI 20.7.0
-- The CXX compiler identification is PGI 20.7.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/2020/compilers/bin/pgc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0 (found version "11.0") 
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - not found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Found BLAS: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib/libblas.so  
-- Looking for Fortran cheev
-- Looking for Fortran cheev - not found
-- Looking for Fortran cheev
-- Looking for Fortran cheev - found
-- A library with LAPACK API found.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/damian/projects/my-project/build

Hi,

Are you using Jetson Nano? (since here is Nano board)

Based on your log, it looks CUDA 11.0 is installed on your environment.
However, we don’t have CUDA 11.0 release for Jetson device currently.

-- Found CUDA: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/11.0 (found version "11.0") 

Thanks.

Hi,

Yes, I use Jetson Nano with the latest HPC SDK (cuda 11 ships with this SDK) as I use pgfortran.

If I do not specify the HPC SDK in the PATH, I do not have blas, which is required by lapack. It seems it still looks for lapack in the HPC location…

-- The Fortran compiler identification is PGI 20.7.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgfortran - skipped
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgfortran supports Fortran 90
-- Checking whether /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgfortran supports Fortran 90 - yes
-- The C compiler identification is PGI 20.7.0
-- The CXX compiler identification is PGI 20.7.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin/pgc++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "10.2") 
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - not found
-- Looking for Fortran sgemm
-- Looking for Fortran sgemm - found
-- Found BLAS: /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib/libblas.so  
-- Looking for Fortran cheev
-- Looking for Fortran cheev - not found
-- Looking for Fortran cheev
-- Looking for Fortran cheev - found
-- A library with LAPACK API found.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/damian/projects/refonte-alefe/build

On execution there is no blas library found

error while loading shared libraries: libblas.so.0: cannot open shared object file: No such file or directory

Hi,

Sorry that I didn’t notice that HPC SDK do support ARM system.
Let me check this with our internal team and give you more feedback later.

Thanks.

1 Like

Hi,

We can get BLAS and LAPACK installed via apt-get install.
After that, please run the cmake like this:

$ cmake . -DLAPACK_LIBRARIES=/usr/lib/aarch64-linux-gnu/ -DBLAS_LIBRARIES=/usr/lib/aarch64-linux-gnu/

Thanks.

Hi,
Thanks for your response. Lapack and BLAS are found, but still does not exploit the GPU!
I am wondering if the hpc sdk lapack exploit the GPU or should I use another equivalent library?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Would you mind to share the cmake output log with us?
Our cmake can find CUDA 10.2 currectly.

Thanks.