Wondering if any one has the similar problem.
I created a binary that runs some simple calculation in CUDA. Setting the same breakpoint in VSCode it takes significantly longer to reach than if i run it in cuda-gdb. As far as I know the Nsight also uses cuda-gdb so this is very strange.
And this has happened consistently no matter what kernel code I write, and this has happened through out different re-install of my computer, so this is a consistent symptom as far as I am aware.
Wondering if any one has similar experience or insight, greatly appreciate it.