This is a tough question to answer as you haven’t provided any details. What are you measuring? Kernel duration, process duration, etc. How are you measuring time? Is this GPU or CPU time? What are the reported times? What GPU is the code running on?
The profilers do change a few behaviors:
- disable some power management when capturing PM counters
- increase the GPU timer frequency from 1 MHz to 31.25 MHz
- measure kernel execution time more precisely than is possible with CUDA events
- increase CPU overhead
- flush work to GPU faster (using a Windows, not Linux difference)