DO LOOP inside DO CONCURRENT

con2 · December 17, 2020, 2:42pm

Hi,

I tested the NGC Docker version of HPC SDK 20.11 + Ubuntu 20.04.
The results and speed were the same as OpenMP when the -stdpar=multicore option was used.
However, with -stdpar=gpu, the results were diverged.
According to the compilation information, the DO LOOP inside DO CONCURRENT is also parallelized and put on the GPU. Is there any way to prevent this?

Regards,
Con

MatColgrove · December 17, 2020, 6:25pm

Try adding the flag “-acc=noautopar” to disable device auto-parallelization. Unclear if this is indeed the cause of the divergence, which could be due to a number of reasons, but worth a try.

con2 · December 18, 2020, 12:41pm

Thanks for your advice.
Using ‘acc=noautopar’ did not improve the situation.
Since it works fine with '-stdpar= multicore ', is the reason the type of GPU(QuadroRTX8000)?
I’m new to GPGPU, so I don’t know what to fix.

MatColgrove · December 18, 2020, 3:02pm

Do you have a reproducing example you can share, or provide more details?

con2 · December 30, 2020, 1:56am

Unfortunately, I have nothing to share.
One of the compilation results is shown below.
The parallelization of the inner loop in lines 909, 937, and 963 on the GPU does not appear on the CPU.
I thought this might be the cause, but is it not?

CPU
852, Generating Multicore code
852, Loop parallelized across CPU threads

926, Generating Multicore code
    926, Loop parallelized across CPU threads

GPU
852, Generating Tesla code
852, Loop parallelized across CUDA thread blocks ! blockidx%x
909, Loop parallelized across CUDA threads(32) ! threadidx%x

926, Generating Tesla code
    926, Loop parallelized across CUDA thread blocks ! blockidx%x
    937, Loop parallelized across CUDA threads(32) ! threadidx%x
    963, Loop parallelized across CUDA threads(32) ! threadidx%x

Topic		Replies	Views
Noacc flag nvc, nvc++ and nvfortran	2	547	September 14, 2022
OpenACC on GPU and ISO Fortran on multicore nvc, nvc++ and nvfortran	3	506	October 6, 2023
Performance Issue / End of Program Dump using Stdpar nvc, nvc++ and nvfortran gpu-computing	3	17	October 10, 2024
DO CONCURRENT loop order nvc, nvc++ and nvfortran	3	488	August 28, 2024
Slower performance with GPU when using nvfortran, stdpar nvc, nvc++ and nvfortran	2	22	September 23, 2024
Do concurrent with gpu or multicore nvc, nvc++ and nvfortran	4	147	July 8, 2024
Number of threads in `do concurrent` loops nvc, nvc++ and nvfortran	1	587	May 8, 2023
[Fortran][do concurrent] Runtime malloc error in saxpy.f90 with -stdpar flag nvc, nvc++ and nvfortran	3	598	April 28, 2023
Fortran DO CONCURRENT with GPUs are reductions allowed? nvc, nvc++ and nvfortran	4	862	May 28, 2021
Compiler error with 21.5 and OpenACC nvc, nvc++ and nvfortran	7	794	July 22, 2021

DO LOOP inside DO CONCURRENT

Related topics