Nvprof fails to profile a program like the one in this question
No kernels were profiled. No API activities were profiled.
nvc++ -stdpar=gpu -gpu=cc70 -std=c++20
Configuration: gcc 10.2.1, nvc++ 22.5-0, CUDA 11.7, Titan V, driver 515.43.04.
As far as I remember, it used to detect the calls with an earlier version of the drivers and the HPC SDK.
nvtop does show that the GPU is utilized when the application runs in a loop or given bigger vectors, and ncu fails to attach to this program, too.