New edition of "CUDA Fortran for Scientists and Engineers"

mfatica · August 22, 2024, 4:32am

The second edition of “CUDA Fortran for Scientists and Engineers” is now available. The first edition, after 12 years, needed a refresh.
There is a lot of new material in the book, including a chapter on ray tracing.

ivan.pribec1 · August 28, 2024, 10:48pm

Thanks for the revamped edition. I skimmed through it today and already learned a couple new things.

The ray-tracing chapter looks super fun. I’ve seen it done before with MPI (by L. McMillan) and also with co-arrays (in the hands-on examples of R. Bader), but not as feature-rich and commented.

I’ve been studying the FD stencil computation using pencils. I have one question, which is how to approach “rounder” stencils such as for (higher-order) mixed-derivatives? For instance something like this X-shape stencil:

   dfxy = (3.0_wp/8.0_wp)*(f(i+1,j+1) - f(i-1,j+1) + f(i-1,j-1) - f(i+1,j-1)) &
        - (3.0_wp/80.0_wp)*(f(i+2,j+2) - f(i-2,j+2) + f(i-2,j-2) - f(i+2,j-2)) &
        + (1.0_wp/360.0_wp)*(f(i+3,j+3) - f(i-3,j+3) + f(i-3,j-3) - f(i+3,j-3))

Does one proceed analogously to the shared Jacobi kernel?

Fortran · August 29, 2024, 7:38pm

For mixed derivatives, it depends on the domain size. If nx is small, you can load tiles of (nx)x(js-3:je+3) in to shared memory to compute derivatives at (nx)x(js:je). Otherwise the shared memory tiles would have halo cells in both directions, (is-3:ie+3)x(js-3:je+3) to calculate derivatives at (is:ie)x(js:je).

If you are not modifying the f() array anywhere in the kernel, you can also try just declaring it with intent(in) and using f() directly on the right-hand side, in which case the read-only cache is used. This approach may get as good performance for far less effort than using shared memory.

rk49 · November 6, 2024, 6:11pm

I have access to this book (through my University). Are there code downloads like there were for the first edition (called supplementary materials)? I found these very helpful.

mfatica · November 7, 2024, 5:31pm

The code is available from this repo:

Topic		Replies	Views
Effective bandwidth of stencil algorithms nvc, nvc++ and nvfortran	1	249	September 3, 2024
Finite Difference Methods in CUDA Fortran, Part 2 Technical Blog	5	407	January 8, 2019
Finite Difference Methods in CUDA Fortran, Part 1 Technical Blog	0	409	August 25, 2020
Finite Difference Methods in CUDA C++, Part 2 Technical Blog	3	617	March 16, 2017
Finite Difference Methods in CUDA C/C++, Part 1 Technical Blog	13	879	May 11, 2017
Your experience with finite differences CUDA Programming and Performance	9	2818	July 10, 2010
__shared__ command - 1 D stencil algorithm CUDA Programming and Performance	0	1090	February 26, 2016
Translating FORTRAN to C++ to CUDA advice CUDA Programming and Performance	19	23462	February 1, 2010
Using Shared Memory in CUDA Fortran Technical Blog	0	420	August 25, 2020
Same Code (really, it is) - Much Different Results CUDA Programming and Performance	38	14223	September 30, 2010

New edition of "CUDA Fortran for Scientists and Engineers"

Related topics