CUDA Pro Tip: Profiling MPI Applications

Originally published at: CUDA Pro Tip: Profiling MPI Applications | Parallel Forall | NVIDIA Technical Blog

When I profile MPI+CUDA applications, sometimes performance issues only occur for certain MPI ranks. To fix these, it’s necessary to identify the MPI rank where the performance issue occurs. Before CUDA 6.5 it was hard to do this because the CUDA profiler only shows the PID of the processes and leaves the developer to figure…

Is this code compatible with the cuda cores on a cluster of Jetson TK1s?