The MVAPICH2 1.8a1 release is targeted for MVAPICH2 users to harness performance on InfiniBand (Mellanox) clusters with NVIDIA GPU adapters and CUDA support. The OMB 3.5 release is targeted for MPI users to carry out benchmarking and performance evaluation of MPI stacks on clusters with NVIDIA GPU adapters and CUDA support.
The feature (since MVAPICH2 1.7GA release) about CUDA support is listed here.
- Support for MPI communication from NVIDIA GPU device memory
- High performance RDMA-based inter-node point-to-point
communication (GPU-GPU, GPU-Host and Host-GPU) - High performance intra-node point-to-point communication
for multi-GPU adapters/node (GPU-GPU, GPU-Host and Host-GPU) - Communication with contiguous datatype
- High performance RDMA-based inter-node point-to-point
New features and Enhancements of OSU Micro-Benchmarks (OMB) 3.5 (since OMB 3.4 release) are listed here.
- Extension of osu_latency, osu_bw, and osu_bibw benchmarks to
evaluate the performance of MPI_Send/MPI_Recv operation with
NVIDIA GPU device and CUDA support
- This functionality is exposed when configured
with --enable-cuda option - Flexibility for using buffers in NVIDIA GPU device (D)
and host memory (H) - Flexibility for selecting data movement between D->D,
D->H and H->D
Sample performance numbers for MPI communication from NVIDIA GPU memory using MVAPICH2 1.8a1 and OMB 3.5 can be obtained from the following URL:
http://mvapich.cse.ohio-state.edu/performance/gpu.shtml
For downloading MVAPICH2 1.8a1, OMB 3.5, associated user guide, quick start guide, and accessing the SVN, please visit the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedbacks, bug reports, hints for performance tuning, patches and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss@cse.ohio-state.edu).
Thanks,
The MVAPICH Team