How to inline PTX with nvfortran

Hello,

I was wondering what the syntax would be for inline ptx in CUDA Fortran kernels. I’d hope the functionality exists without having to interface to CUDA C. I initially assumed it would be the same as CUDA C, e.g. just call
asm(“prefetch.global.L1 [%0];” : : “r”(var) )
(the above taken from a CUDA C post about prefetching, where I replaced ptr with var)

However, compiling this gives me some syntax errors:
“NVFORTRAN-S-0034-Syntax error at or near ) (reduction.cuf: 254)
0 inform, 0 warnings, 1 severes, 0 fatal for device_reduce_warp_memaccesses_vec4_vectorized_prefetch
call_reduction.cuf:”

I tried finding documentation of this in the CUDA Fortran programming guide, as well as the PTX guide, but no luck.

For context, I wanted to experiment with prefetching, since the above reduction kernel is severely limited by long scoreboard stalls. However, I can see myself playing with inline PTX in other contexts, so I would like to know the syntax in CUDA Fortran.

Sorry, ASM statements aren’t supported in Fortran.

I see! Thanks for letting me know. It’s not the end of the world to tinker with the intermediate PTX file, so I’ll work on that.