Overload when using managed mempory flag in openacc

ahmed.abdellatif1 · May 27, 2022, 9:01am

I am trying to use the unified memory managed in my code but first I tried to add the flag while compiling as below without adding any pragmas directives in the code.

-fast -acc -ta=tesla:managed -Minfo=accel -Mcuda -lnvToolsExt -O3 -Wall -std=c++0x -fPIC -I$(PROJECT_PATH)

I noticed that the GPUs become active as below and start doing transfers and the code become too slow when compiled

When I profile there is transfers but I dont know what is the reason for it as below

what could be the reason of the problem from your opinion?

Thanks for your time

MatColgrove · May 27, 2022, 4:26pm

A CUDA Context will still be created and why you’re seeing the binaries in the nvidia-smi report.

As for the data movement, I’m not sure. Given that there is 8 processes, does the program use MPI? If so, it may be some initialization MPI is doing in order to support CUDA Aware MPI. Or there might be some global parameters being implicitly copied.

If you set the environment variable “NV_ACC_NOTIFY=3”, does the output show any data movement?

-Mat

ahmed.abdellatif1 · May 29, 2022, 8:21pm

Thanks for your reply, Yes the program uses MPI and I activated NV_ACC_NOTIFY=3 and it didn’t show any data movement. I don’t know what is causing the data movement and it’s taking a lot of time. Is there any other way to check what is causing this? I am kind of new to OpenACC stuff so it’s a bit confusing for me to catch up on what could be going on. Your help will be appreciated.

Ahmed

MatColgrove · May 31, 2022, 4:10pm

If it’s not showing up under the “NV_ACC_NOTIFY”, then the data movement is not coming from the OpenACC runtime. I would suspect it’s coming from the MPI library.

Next step would be to run the program through Nsight-systems adding “-trace cuda,openacc,mpi”. This will add details on the MPI communication and OpenACC API calls. Also, if you view the timeline in the GUI you can see when the data movement is occurring and might give clues as to where the data movement is coming from.

-Mat

Topic		Replies	Views
Profiling OpenAcc code using Nsight System nvc, nvc++ and nvfortran cuda	1	585	February 22, 2024
Direct GPU-to-GPU data transfer with OpenACC+managed+MPI nvc, nvc++ and nvfortran	4	1339	April 12, 2022
CUDA Unified Memory By PGI Legacy PGI Compilers	5	5718	April 6, 2016
OpenACC: Best way to manage data transfer between host and device Legacy PGI Compilers	6	2921	July 28, 2021
analysis of memory usage on GPU Legacy PGI Compilers	4	5856	March 15, 2016
pgf90 + openacc & managed memory / um-evaluation package Legacy PGI Compilers	8	9010	June 16, 2015
Trivial question on memory managed and unified memory. Legacy PGI Compilers	4	1529	June 21, 2024
Problem using the -tp=x64 flag with OpenACC Legacy PGI Compilers	6	5291	August 25, 2017
Call to collective mpi subroutine with openacc host_data directive Legacy PGI Compilers	8	1165	March 26, 2021
Halo Exchange updates MPI with OpenACC nvc, nvc++ and nvfortran cuda , gpu , openmpi	1	973	June 13, 2022

Overload when using managed mempory flag in openacc

Related topics