Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)

Originally published at:

This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we extend the matrix transpose example from a previous post to operate on a matrix that is distributed across multiple GPUs. The data layout is shown in Figure 1 for an nx × ny =…

Hi Greg,
I just thought I would let you know that I found your posts on CUDA Fortran extremely useful and easy to follow. A great introduction that is informative and enjoyable.
Maybe I'll take a peek at CUDA Fortran for Scientists and Engineers next.

Thank you for your great work.

Hi Mana,
Thanks for your feedback, I'm glad you enjoy the posts. There are several new CUDA Fortran features slated for 2014 that I'll be writing about in upcoming posts, I hope you'll find those useful as well.