I have a question about CUDA-aware MPI and GPUDirect. I am writing a paper in which I believe that I am using GPUDirect but a reviewer commented that I am only using CUDA-aware MPI. On the cluster I am using, I have CUDA 8.0.61, PGI 18.1 and MVAPICH2-GDR/2.3b installed, and in the OpenACC code I am using host_data use_device to send/receive buffers directly between GPUs within a node (as I know GPUDirect RDMA is not working in my present situation but I also know that GPUDirect has many levels and it is an umbralla word). Could someone provide some thoughts on this? Thanks!