I have a small modern Fortran code with OpenACC that I want to test on my Jetson Nano Developer Kit (regular 4GB version). I’ve successfully compiled it with NVIDIA HPC SDK 20.11, SBSA version:
$ nvfortran -acc=gpu -Minfo=acc ddot.f90 -o ddot.x dot_product: 42, Generating create(result,vecb(:),veca(:)) [if not already present] 45, Generating present(vecb(:),veca(:)) Generating copyin(n) [if not already present] Generating Tesla code 46, !$acc loop gang, vector(128) ! blockidx%x threadidx%x 53, Generating present(result) Scalar last value needed after loop for result at line 63,66,78,81,79 Accelerator serial kernel generated Generating Tesla code 63, Generating present(result,vecb(:),veca(:)) Generating copyin(n) [if not already present] Generating Tesla code 65, !$acc loop gang, vector(128) ! blockidx%x threadidx%x Generating reduction(+:result) 73, Generating update self(result)
However, when I try to run it, I got an error message I’ve never seen before:
$ echo 100000000 | ./ddot.x Input vector length n: Using n = 100000000 Failing in Thread:1 call to cuModuleLoadDataEx returned error 209: No binary for GPU
Any help would be appreciated.
PS: Yeah, I know NVHPC 21.2 is available now, but haven’t got time to deploy it on the Jetson Nano