I’ve recently been trying to learn how to code with GPUs with FORTRAN. I’m using the HPC SDK nvfortran compiler compiling code on the Jetson Nano 2GB development kit. I would really appreciate getting some help with the errors I’ve been getting trying to run the following code (compiled with command nvfortran saxpy_gpu.f90):

module mathOps
  attributes(global) subroutine saxpy(x, y, a)
    implicit none
    real :: x(:), y(:)
    real, value :: a
    integer :: i, n
    n = size(x)
    i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
    if (i <= n) y(i) = y(i) + a*x(i)
  end subroutine saxpy 
end module mathOps

program testSaxpy
  use mathOps
  use cudafor
  implicit none
  integer, parameter :: N = 40000
  real :: x(N), y(N), a
  real, device :: x_d(N), y_d(N)
  type(dim3) :: grid, tBlock

  tBlock = dim3(256,1,1)
  grid = dim3(ceiling(real(N)/tBlock%x),1,1)

  x = 1.0; y = 2.0; a = 2.0
  x_d = x
  y_d = y
  call saxpy<<<grid, tBlock>>>(x_d, y_d, a)
  y = y_d
  write(*,*) 'Max error: ', maxval(abs(y-4.0))
end program testSaxpy

The program compiles without any complaints, but after running, the program gives the wrong result. Running with cuda-memcheck produces the following output:

$ cuda-memcheck a.out
========= Program hit cudaErrorNoKernelImageForDevice (error 209) due to "no kernel image is available for execution on the device" on CUDA API call to cudaLaunchKernel. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2fdb04]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.10.2 (cudaLaunchKernel + 0x1ac) [0x4b6a4]
=========     Host Frame:/opt/nvidia/hpc_sdk/Linux_aarch64/20.7/compilers/lib/libcudafor.so (__pgiLaunchKernel + 0x278) [0x89f80]
 Max error:     2.000000    
========= ERROR SUMMARY: 1 error

where max error should be 0.0.

I can compile and run matrixMul and simplePrintf sample nvcc programs on the Jetson Nano without issue (after following advice here to sudo chmod a+rw /dev/nvhost-dbg-gpu. I suspect it’s due to the use of attribute(global) subroutine as I can compile and run the simple device query program described here

I’m running:
Jetpack 4.6, with CUDA 10.2. HPC SDK 20.7 is installed.
nvfortran is run after loading the pre-made modulefile with CUDA_HOME=/usr/local/cuda and PATH and LD_LIBRARY_PATH to also point to /usr/local/cuda.


Usually, the ’ no kernel image is available for execution’ is caused by compiling with the incorrect GPU architecture.
For Nano, the GPU architecture should be 5.3.

However, please note that the HPC compiler (NVC++, NVC, NVFORTRAN) doesn’t support the Jetson platform.


Hi AastaLLL,

I’m aware that nvcc has an option specifying the architecture, but the closest I found to specifying architecture for nvfortran is specifying compute capability with -gpu=cc50. Is this the right compiler option to be using? With the mentioned option, I get a warning that -gpu doesn’t do anything without additional language specific options.

Yeah I did gather on my googling journey that Jetson wasn’t supported. Do you know if support will ever come to it?

Nano GPU architecture should be sm=5.3.

$ nvcc -gencode arch=compute_53,code=sm_53 xxx.cu

Currently, we don’t have a concrete schedule to support nvfortran on Jetson.


