Nvc++ doesn't parallelise across cpu cores if -stdpar is not specified

Hi,

I’m testing nvc++ with various combinations of execution policy and -stdpar …

includes="$NVCOMPILERS/$NVARCH/20.5/compilers/include-stdpar"
compile="nvc++ -Wall -fast -I $includes"

$compile                                            -o main_no_policy    main.cpp
$compile         -DPOLICY=std::execution::seq       -o main_seq          main.cpp
$compile         -DPOLICY=std::execution::par_unseq -o main_par_unseq    main.cpp
$compile -stdpar -DPOLICY=std::execution::par_unseq -o main_stdpar_unseq main.cpp

echo "Running serially (no policy)" && ./main_no_policy
echo "Running sequentially" && ./main_seq
echo "Running in parallel unseq without GPU acceleration" && ./main_par_unseq
echo "Running in parallel unseq with    GPU acceleration" && ./main_stdpar_unseq


=======================================
Testing nvc++
=======================================
Running serially (no policy)
Elapsed time in nanoseconds : 4018807479 ns
Elapsed time in microseconds : 4018807 µs
Elapsed time in milliseconds : 4018 ms
Elapsed time in seconds : 4 sec

Running sequentially
Elapsed time in nanoseconds : 4005378936 ns
Elapsed time in microseconds : 4005378 µs
Elapsed time in milliseconds : 4005 ms
Elapsed time in seconds : 4 sec

Running in parallel unseq without GPU acceleration
Elapsed time in nanoseconds : 4005979476 ns
Elapsed time in microseconds : 4005979 µs
Elapsed time in milliseconds : 4005 ms
Elapsed time in seconds : 4 sec
    
Running in parallel unseq with    GPU acceleration
Elapsed time in nanoseconds : 196931098 ns
Elapsed time in microseconds : 196931 µs
Elapsed time in milliseconds : 196 ms
Elapsed time in seconds : 0 sec

So if the execution policy is std::execution::par_unseq then the compiler will execute this sequentially / serially on the CPU unless -stdpar is specified in which case it correctly executes this on the GPU.

g++ parallelises the same code across the CPU cores without any problem.

Thanks,

Leigh.

Hi Leigh,

We do not currently support parallel execution with stdpar across multiple CPU, so the behavior is expected. It’s something that we might add in the future, but currently, we only parallelize when targeting GPUs.

-Mat