nvprof cannot profile simple kernel from NVIDIA CUDA Samples

I have installed CUDA Toolkit 10.2 on Ubuntu 18.04. The installer has reported:

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-10.2/
Samples:  Installed in /home/myname/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.2/doc/pdf for detailed information on setting up CUDA.
Logfile is /var/log/cuda-installer.log

The command

uname -a

prints

Linux myname-ThinkPad-W530 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

The command

nvidia-smi

results in

Wed Dec 18 22:47:31 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K1000M       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P8    N/A /  N/A |    590MiB /  1999MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

The command

gcc --version

results in

gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

I navigated in

cd ~/NVIDIA_CUDA-10.2_Samples/0_Simple/vectorAdd

and run

make

, then I navigated in

cd ~/NVIDIA_CUDA-10.2_Samples/bin/x86_64/linux/release

and run

./vectorAdd

that yields

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Then I run

sudo /usr/local/cuda-10.2/bin/nvprof --kernels vectorAdd --metrics all ./vectorAdd

that leads to

==14634== NVPROF is profiling process 14634, command: ./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==14634== Some kernel(s) will be replayed on device 0 in order to collect all events/metrics.
Replaying kernel "vectorAdd(float const *, float const *, float*, int)" (done)
Failed to launch vectorAdd kernel (error code unknown error)!
==14634== Error: Internal profiling error 4107:999.
======== Error: CUDA profiling error.