I am aware of specifying multiple cc versions to create a fat binary, and already do so.
I was just wondering if a method of embedding PTX was possible or not via PGI, so that If i build an executable using CUDA 8.0 it would be possible to execute on Volta hardware, I.e. by embedding PTX for JIT compilation at runtime on more-modern hardware (-gencode arch=compute_XX,code=compute_XX).
Using the appropriate -gencode argument from nvcc does allow this,
For one of the executables we build usign pgfortran, we no longer use openacc, but rather link against CUDA C object files (which include PTX). The executable produced (using -cudalibs and -Mcuda=…) works correctly for the list of Mcuda arguments but does not work for newer architectures.
My thoughts are now to use nvcc to link the fortran and cuda-c object files which may allow the embedded PTX to link correctly?
I would need to find the correct linker arguments for this to work.