No kernels were profiled with standard C++ parallelism

Nvprof fails to profile a program like the one in this question

No kernels were profiled.
No API activities were profiled.

Command options:

nvc++ -stdpar=gpu -gpu=cc70 -std=c++20

Configuration: gcc 10.2.1, nvc++ 22.5-0, CUDA 11.7, Titan V, driver 515.43.04.

As far as I remember, it used to detect the calls with an earlier version of the drivers and the HPC SDK.

nvtop does show that the GPU is utilized when the application runs in a loop or given bigger vectors, and ncu fails to attach to this program, too.