Jetson nano and HPC SDK

Hi guys,

I’ve recently get a Jetson Nano rev B01 and I have some trouble running the HPC SDK.

I’ve install the official image Ubuntu 18.04 for Nano, which provides Cuda 10.02. I’ve check that this work well with some simple cuda C code, all is ok.

Since Cuda 11 is not officially supported, I’ve download and install the
HPC SDK version 20.07, compatible with arm 64bit architecure and 10.02 cuda version.

The installation went well but I can’t run any code in cuda Fortran or openACC.

For example, here is a simple cuda fortran code :

attributes(global) subroutine doWork(a)
implicit none
real(KIND=8) :: a(:)
integer :: i
i = threadIdx%x
print*, "Idx = ",i
a = 2.0*a
end subroutine doWork

PROGRAM Version
use cudafor
implicit none
integer :: cversion,err,rversion
type(cudaDeviceProp) :: propdev
real(kind=8) :: a
real(kind=8), device :: a_d

err = cudaGetDeviceProperties(propdev, 0)
err = cudaDriverGetVersion(cversion)
err = cudaRuntimeGetVersion(rversion)

print*, "Device : ",propdev%name
print*, "Cuda driver : Major = ",cversion/1000,"minor = ",mod(cversion,100)/10
print*, "Runtime ver : Major = ",rversion/1000,"minor = ",mod(rversion,100)/10
print*, “=========================================================”

a = 1.0
a_d = a
call doWork<<<1,1>>>(a_d)
a = a_d
print*, "End, a = ",a

end program version

Here is the output compiled with nvfortran :
Device :
NVIDIA Tegra X1
Cuda driver : Major = 10 minor = 2
Runtime ver : Major = 10 minor = 2

End, a = 1.000000000000000

The cuda driver and cuda runtime is correctly detected, but no threads is running.

Does anyone knows what’s going on ?
I’ve seen somewhere that nano board use a specific driver, Is it the problem source ?

Thanks for help

B-

Hi,

Based on the support matrix below, you should install 2020.9 release for nvfortran+CUDA 10.2:

Thanks.

Hi AastaLLL,

thanks for your reply.
However, this doesn’t did the trick.

First, as mentionned in the support matrix of the 20.07 doc :

the 20.07 version is also compatible with cuda 10.02.

Moreover, the 20.09 depends on cuda 11 and the install fails if cuda 11 is not installed. See for example the install output on nano (same with .deb package) :

ERROR: The determined CUDA version from the driver (10.2) is
older than the oldest bundled CUDA version (11.0)
CUDA versions are not forwards-compatible
Exiting…

As mentionned here by kayccc :

cuda 11 is not officially supported for jetson nano.

Hence, only 20.07 seems to be compatible with jetson nano.

Do you have an other idea to make it works ?

Thanks,

B-

Hi,

Thanks.

We do not have CUDA 11.0 for Jetson currently.
Let us check this internally and share a more precise suggestion with you later.

Hi,

We can run a default CUDA fortran sample without issue.
Please help to check if you can do the similar.

Install HPC

$ wget https://developer.download.nvidia.com/hpc-sdk/20.7/nvhpc-20-7_20.7_arm64.deb 
$ wget  https://developer.download.nvidia.com/hpc-sdk/20.7/nvhpc-2020_20.7_arm64.deb
$ sudo apt-get install ./nvhpc-20-7_20.7_arm64.deb ./nvhpc-2020_20.7_arm64.deb

Compile sample

$ export PATH=/opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/bin:$PATH
$ cd /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/cuda/
$ sudo ln -s /usr/local/cuda-10.2 10.2
$ cd ../examples/CUDA-Fortran/TensorCores/m16n16k16/
$ make

Log

nvfortran -Mcuda -c  ../Utils/vector_types.F90
nvfortran -Mcuda -c ../Utils/check_mod.f90
nvfortran -Mcuda -I../Utils -o wmma1.out t1.CUF wmma1.CUF check_mod.o vector_types.o
t1.CUF:
wmma1.CUF:
./wmma1.out
256 tests completed. 256 tests PASSED. 0 tests failed.

Thanks.

Hi AastaLLL,

many thanks for your reply, but it failed …

I restarted from a fresh install of the official image of Ubuntu 18.04.
I’ve (re) test some trivial cuda C code, all work fine.

Then I download the deb package (your Install HPC part), add the correct path to PATH and make the symbolic link to /usr/local/cuda-10.2
Then :
cp -r /opt/nvidia/hpc_sdk/Linux_aarch64/20.7/examples/ ~
cd examples/CUDA-Fortran/TensorCores/m16n16k16/
make

the build fails, here is the output :
nvfortran -Mcuda -c …/Utils/vector_types.F90
nvfortran -Mcuda -c …/Utils/check_mod.f90
nvfortran -Mcuda -I…/Utils -o wmma1.out t1.CUF wmma1.CUF check_mod.o vector_types.o
t1.CUF:
wmma1.CUF:
NVFORTRAN-S-1001-All selected compute capabilities were disabled (see -Minfo) (wmma1.CUF: 1)
0 inform, 0 warnings, 1 severes, 0 fatal for
Makefile:40: recipe for target ‘build’ failed
make: *** [build] Error 2

adding the -Minfo into the Makefile (as suggested), I see the following error :
mod1:
39, warp matrix function disables compute capability 5.0 kernel

I suspect that the cuda compute capability of jetson nano (5.3) is not sufficient for this.

I then tried a simplier test :
cd ~
cd examples/CUDA-Fortran/CUDA-Fortran-Book/chapter1/increment
make

here, the build is correct, but the run fails. Here is the output :

nvfortran -O2 -o increment.out increment.cuf
./increment.out
**** Program Failed ****

After some inspection, I see that no threads were running on the gpu.

Do you have any ideas ?

Many thanks for your help

B-

Hi,

Sorry that TensorCores do require GPU architecture >= sm_7.x.
We can compile and execute CUDA-Fortran-Book/chapter1/increment without issue on other platform.

Will confirm this on a Nano board and share more information with you.
Thanks.

Hi,

Checking with our internal team, HPC compiler doesn’t support Jetson platform.
It is for Arm Server CPU platforms only.

Thanks.

Hi AastaLLL,

thank you for your reply.

I’m quite surprised and disappointed that HPC compiler doesn’t support Nano board which is designed for educational purpose.
I bought it because the HPC SDK 20.0X appears to be compatible with arm64 architecture and I though I could use some simple functionnality
such as CUDA Fortran or openACC.

Is there another way to run cuda FORTRAN code on Jetson board ?

Thanks,

B-

Hi,

No, sorry about that.
But when we test this issue, the HPC SDK can work on the Xavier series board although it is not officially supported.

Thanks.