I use NSIGHT to profile my cuda project and I notice somethink special.
When I use the cudaEventRecord to get the time of my process I get ~21ms but in Nsight when I look at the scope I get ~38ms for the process.
Is it normal that I get this huge gap?
Generally speaking, profiling does introduce some overhead, I often see about 5% slowdown at application level. Whether the specific numbers you report are “as expected” is impossible to tell without knowledge of the code profiled, the specific configuration of the profiler, and the hardware used.