OpenACC code using gpu=managed is faster than explicit data management for multiple gpu computation

fw407 · January 13, 2025, 9:17am

Dear All,

I was using explicit data management for my openACC code and the code works for multiple GPUs. In my implementation, I can group GPUs to different MPI communicators, this can facilitate multiple component coupling. For gpus in different communicators, their communication have to go through the host.

For my own curiosity, I have compiled my code with -gpu=managed.. For a single gpu, the code is slower than the one without -gpu=managed., which is what I expected.

But for my multiple gpu case (the gpus are in different communicators), the code with -gpu=managed is actually faster (almost twice), this is not what I expect and I don’t understand why.

Could someone please give some possible explanation? this could point me to the right direction to optimize the code for multiple gpu cases.

Thanks
Feng

MatColgrove · January 13, 2025, 6:20pm

Difficult for me to tell, but when I’ve seen this it was due to me explicitly copying more data then needed. For example the code printed out the interior of an array. For explicit data movement, I copied the full array. For managed, the driver only need to copy the printed elements.

fw407 · January 14, 2025, 10:55am

Hi Mat,

Thanks for your reply. It might be that case, I think I need to do some multiple GPU optimization.

Thanks,
Feng

Topic		Replies	Views
Has anyone addressed GPU memory limitations in OpenACC-ported scientific applications? nvc, nvc++ and nvfortran kernel , gpu	6	149	December 5, 2025
OpenACC: Best way to manage data transfer between host and device Legacy PGI Compilers	6	2921	July 28, 2021
Direct GPU-to-GPU data transfer with OpenACC+managed+MPI nvc, nvc++ and nvfortran	4	1339	April 12, 2022
Is unified memory (-gpu=managed) supported for OpenMP offloading (-mp=gpu)? nvc, nvc++ and nvfortran	5	1401	September 16, 2023
OpenACC Multi GPU Memory Informations nvc, nvc++ and nvfortran	5	464	January 31, 2024
Fortran90 / OpenACC / Multi GPU Code Time measure nvc, nvc++ and nvfortran	6	754	February 12, 2024
Overload when using managed mempory flag in openacc nvc, nvc++ and nvfortran hpc , gpu	3	755	May 31, 2022
Does nvfortran -stdpar=gpu support two GPUs with NVLink? nvc, nvc++ and nvfortran	11	240	March 19, 2025
OpenMP + OpenACC problem Legacy PGI Compilers	9	5398	April 17, 2019
OpenACC kernel running slower than expected Legacy PGI Compilers	4	1421	August 31, 2021

OpenACC code using gpu=managed is faster than explicit data management for multiple gpu computation

Related topics