Accelerating Standard C++ with GPUs Using stdpar

jwitsoe · August 25, 2020, 11:52pm

Originally published at: https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++ requires the use of host and device attributes on functions and the triple-chevron syntax for GPU kernel launches.OpenACC uses #pragmas to control GPU acceleration.Thrust lets you express parallelism portably but uses language…

aminamooie · January 7, 2021, 5:25pm

This was a great article! It appears that all the discussions and example are based on accelerating standard C++ (code) without any need for CUDA programming but only on one single GPU.

From my work so far on multi-GPU programming, invoking two GPUs and partitioning the data in between always needs some CUDA related code – for instance, binding a MPI rank or a thread to one of the GPUs, or using CUDA Streams for simultaneous use of multiple GPUs and probably other approaches to enable accelerations on multi-GPUs all need selecting the device one way or another which needs CUDA.

All of these are in the opposite direction of “Accelerating Standard C++ with a GPU Using stdpar”, where the goal is to not change the CPU-based code (with no CUDA runtime API, etc.) and compile the code simply with NVC++. So I’m very curious if there any way around this currently, and if not is this something to look forward to in the future? I’d appreciate any insights here.

blelbach · January 8, 2021, 1:13am

Yes, multi-GPU stdpar support is on the roadmap.

nvidia1064 · August 25, 2021, 7:42pm

Great to hear! Here in August 2021 is there any new developments on the NVC++ compiler using multiple GPUs?

blelbach · August 30, 2021, 10:38pm

Not yet I’m afraid, stay tuned!

bdunfordshore · March 25, 2022, 3:33pm

I understand that the containers in use must be using the heap, not the stack, in order for unified memory to have the data visible to both CPU and GPU. My question is whether or when it will be possible for the containers memory to be a mmap pointer instead of a RAM pointer?

grahamlopez · April 8, 2022, 1:03pm

Hi, thanks for the question. We are working on enabling more memory types such as stack memory for use with the parallel algorithms. mmap memory is not on our near-term roadmap, but I have forwarded your inquiry on to the team.

vduvanenko1 · July 28, 2023, 8:02pm

I’ve created a benchmark for Standard C++ Parallel STL functions. When compiling it with nvc++ with -stdpar these functions run much slower than the serial (single-core CPU) versions and sort() along with stable_sort produce Segmentation Fault (when sorting a vector of 100 Million 32-bit integers). This is running on a Dell Alienware laptop with GeForce RTX 3060 GPU and 12-th Gen Intel 14-core CPU.

Are there certain compiler switches that should be used to produce results that accelerate these functions? I use -stdpar and -O3 for the nvc++

When compiling (using nvc++) without -stdpar all benchmarks, including sort() and stable_sort(), run to completion without segmentation fault, executing on a single-core of the Intel CPU.

Thank you,
-Victor

Topic		Replies	Views
Does StdPar speed up native loops? nvc, nvc++ and nvfortran	4	654	May 3, 2023
Device code generated from -stdpar versus thrust nvc, nvc++ and nvfortran	12	2717	June 13, 2022
Nvc++ & external CUDA-thrust conflicts for -stdpar offload nvc, nvc++ and nvfortran	5	555	December 12, 2022
Std::transform_reduce incompatible with nvc++ -stdpar=gpu nvc, nvc++ and nvfortran algorithm	1	605	December 1, 2022
Issues with stdpar using nvc++ on Grace-Hopper nvc, nvc++ and nvfortran	2	33	March 23, 2026
LLVM Error when compiling C++ STD parallel execution policies to GPU nvc, nvc++ and nvfortran	9	719	May 2, 2024
Nvc++ -stdpar functionality possible without single compilation unit? host linker? nvc, nvc++ and nvfortran	4	832	December 30, 2022
Problem with OpenAcc and CPP STL nvc, nvc++ and nvfortran cuda	17	1018	January 26, 2024
NVC++ using external libraries nvc, nvc++ and nvfortran	17	1423	August 2, 2021
Nvc++ doesn't parallelise across cpu cores if -stdpar is not specified nvc, nvc++ and nvfortran	1	522	August 6, 2020

Accelerating Standard C++ with GPUs Using stdpar

Related topics