How to create a sequent kernel in CUF or CUDA FORTRAN

Hi Experts,

To remove the unnecessary data movements between host and device, sometimes I may need to define a sequent kernel by using CUF or CUDA FORTRAN. I know how to do this in OpenACC ( ACC SEQ), but have no idea about how to do this through CUF or CUDA FORTRAN. Can anyone give me some idea about this?

Thank you so much!

Use a schedule of 1 block with 1 vector.

call foo <<<1,1>> (args)

!$cuf kernel do <<<1,1>>>

Thank you Mat.

Look at this code:
program test
!@cuf use cudafor

            integer :: II
            integer :: istat

            !$CUF kernel do <<<1,1>>>
            II = 1

            istat = cudaDeviceSynchronize()
            print *, II
    end program

I got compiling errors:
nvfortran -cuda ./test.F90
NVFORTRAN-S-0104-Illegal control structure - unterminated CUDA KERNEL directive (./test.F90: 7)
0 inform, 0 warnings, 1 severes, 0 fatal for test

If I remove the “do” in the CUF line, I got:

NVFORTRAN-S-0034-Syntax error at or near ‘<<<’ (./test.F90: 7)
0 inform, 0 warnings, 1 severes, 0 fatal for test

Maybe I have made a silly mistake?


Sorry, for CUF kernels, there needs to be a loop so in this case, you’ll need to do something like:

        !$CUF kernel do <<<1,1>>>
        do I=1,1
          II = 1
        end do

Note that OpenACC can be intermixed so it might be easiest to mix in a serial region

!$acc serial
!$acc end serial

Thank you Mat, this works!