Nested stdpar algorithm calls


It looks like nesting stdpar algorithm calls is not possible when targeting GPUs; for example:

std::for_each(std::execution::par, /*...*/, [](int i) {
  // ...
  std::transform(std::execution::par, /*...*/);
  // ...

I’m getting the following error when compiling code like that:

NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported procedure

I’m using nvc++ 23.9-0 64-bit target on x86-64 Linux -tp alderlake, but I’ve also seen it with older versions.

Is that a structural limitation (if yes, do you know if it’s documented)? Are there any plans to support it?


Could you please try using par on the outer and unseq on the inner and see if that gets rid of your compilation failure?

Yes, that works, but only the outer algorithm runs in parallel this way, right?

Right, and that is a limitation currently. We have an open problem report on it internally but it is probably not high on the priority.

1 Like

Cool, thanks. It’d be exciting to have that one day.

The code snippet above is sometimes easily converted to a single parallel std::for_each (or std::transform).

However, this is not true when the inner algorithm is a scan (e.g. std::transform_inclusive_scan): currently, there’s no way you can run several parallel scans in parallel (I’m talking about GPU algorithms).