CUDA Fortran: Unroll in Kernel

Is there a way to unroll a simple do loop inside of a CUDA Fortran kernel?

I tried, amongst others,

!pgi$l unroll = n:2

plus

-Munroll

as a compiler switch, but I can’t convince the compiler to unroll the loop.

Am I using the command incorrectly? Or is this feature simply not available (if so: why?)?

Have you tried

pgf90 -Mcuda=unroll and/or compile at -O3 ?

If those don’t work, is it possible to post an example of your loop, or send to trs@pgroup.com?

thanks
-dave