New edition of "CUDA Fortran for Scientists and Engineers"

The second edition of “CUDA Fortran for Scientists and Engineers” is now available. The first edition, after 12 years, needed a refresh.
There is a lot of new material in the book, including a chapter on ray tracing.

3 Likes

Thanks for the revamped edition. I skimmed through it today and already learned a couple new things.

The ray-tracing chapter looks super fun. I’ve seen it done before with MPI (by L. McMillan) and also with co-arrays (in the hands-on examples of R. Bader), but not as feature-rich and commented.

I’ve been studying the FD stencil computation using pencils. I have one question, which is how to approach “rounder” stencils such as for (higher-order) mixed-derivatives? For instance something like this X-shape stencil:

   dfxy = (3.0_wp/8.0_wp)*(f(i+1,j+1) - f(i-1,j+1) + f(i-1,j-1) - f(i+1,j-1)) &
        - (3.0_wp/80.0_wp)*(f(i+2,j+2) - f(i-2,j+2) + f(i-2,j-2) - f(i+2,j-2)) &
        + (1.0_wp/360.0_wp)*(f(i+3,j+3) - f(i-3,j+3) + f(i-3,j-3) - f(i+3,j-3))

Does one proceed analogously to the shared Jacobi kernel?

For mixed derivatives, it depends on the domain size. If nx is small, you can load tiles of (nx)x(js-3:je+3) in to shared memory to compute derivatives at (nx)x(js:je). Otherwise the shared memory tiles would have halo cells in both directions, (is-3:ie+3)x(js-3:je+3) to calculate derivatives at (is:ie)x(js:je).

If you are not modifying the f() array anywhere in the kernel, you can also try just declaring it with intent(in) and using f() directly on the right-hand side, in which case the read-only cache is used. This approach may get as good performance for far less effort than using shared memory.

1 Like

I have access to this book (through my University). Are there code downloads like there were for the first edition (called supplementary materials)? I found these very helpful.

The code is available from this repo:

1 Like