Profiling Numba CUDA with nsight sys

Tried profiling this example from Numba CUDA team’s G.Markall:

Numba CUDA example from G.Markall

The example runs, nvidia-smi shows GPU activity, but profiling doesn’t show the GPU activity at all only much CPU activity.
Any ideas?

I run it on a Ubuntu server with a 7.5 compute capability GPU with this command:

nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s none -o nsight_report -f true -x true python kernel_progress.py

I realize it has been a long time and you probably do not have the results file anymore, but if you do, please submit it.