I think I’ve uncovered a subtle interaction between NVC++ and an external CUDA installation when using -stdpar=gpu offload.
We have a module environment as is typical on many HPC installations. When I purge my environment and load only nvhpc, -stdpar=gpu works as expected.
However, when I have a cuda environment loaded as well, -stdpar=gpu has various runtime issues: sometimes runtime errors with failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
; but perhaps most troubling sometimes no runtime errors detected, just erroneous results in a simple std::sort().
I suspect this is an interaction with the thrust detected in the external cuda installation. Any guidance is appreciated!
-Ben
nvhpc only, correct:
$ wget https://raw.githubusercontent.com/benkirk/paradigms_playground/master/parallel_stl_sort.C
$ module purge && module load nvhpc && module list
Currently Loaded Modules:
1) ncarenv/22.10 (S) 2) craype/2.7.17 (S) 3) nvhpc/22.7
$ nvc++ -stdpar -o parallel_stl_sort parallel_stl_sort.C && ./parallel_stl_sort
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::seq: 43.127 sec.
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::par: 0.632754 sec.
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::par_unseq: 0.073256 sec.
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
** nvhpc+cuda, incorrect:**
$ wget https://raw.githubusercontent.com/benkirk/paradigms_playground/master/parallel_stl_sort.C
$ module purge && module load nvhpc cuda && module list
Currently Loaded Modules:
1) ncarenv/22.10 (S) 2) craype/2.7.17 (S) 3) nvhpc/22.7 4) cuda/11.4.4
$ nvc++ -stdpar -o parallel_stl_sort parallel_stl_sort.C && ./parallel_stl_sort
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 5 15 19 22 31 55 60 61 63 88 95 ...
after unique: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
std::copy() / std::sort() / std::unique() / std::execution::seq: 42.8227 sec.
final: v.size()=471992679; 5 15 19 22 31 55 60 61 63 88 95 ...
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 0 0 0 0 0 0 0 0 0 0 0 ...
after unique: v.size()=59328; 765600696 352808383 3641236997 4016398694 1279020192 465826551 864301009 822663315 3257882672 1989160727 4086747794 ...
==> ERROR: size mismatch from serial algorithm!
std::copy() / std::sort() / std::unique() / std::execution::par: 0.61413 sec.
final: v.size()=59328; 765600696 352808383 3641236997 4016398694 1279020192 465826551 864301009 822663315 3257882672 1989160727 4086747794 ...
input: v.size()=500000000; 3499211612 581869302 3890346734 3586334585 545404204 4161255391 3922919429 949333985 2715962298 1323567403 418932835 ...
after sort: v.size()=500000000; 0 0 0 0 0 0 0 0 0 0 0 ...
after unique: v.size()=643648; 2435803498 2970823809 3485536073 2755831796 3868881694 2623710790 2458607871 3552076208 607421919 2528345273 2013025721 ...
==> ERROR: size mismatch from serial algorithm!
std::copy() / std::sort() / std::unique() / std::execution::par_unseq: 0.606054 sec.
final: v.size()=643648; 2435803498 2970823809 3485536073 2755831796 3868881694 2623710790 2458607871 3552076208 607421919 2528345273 2013025721 ...