Different result with concurrent kernel displayed in timeline

When I used Nsight to profile my practice of concurrent kernel, independent kernel not overlap. But it works fine with nvvp, I don’t know is it a bug?

Nsight ScreenShot: http://i.imgur.com/Pzcht22.png?1

Visual Profiler Screenshot: http://i.imgur.com/11OWrVQ.png


  1. Display Info: Geforce GTX 660 Ti / Driver: 327.23
  2. CUDA toolkit: 5.5
  3. Nsight / CUDA Kernel Trace Mode already set to “Concurrent”
  4. Visual Profiler: 5.5.0
  5. IDE: visual studio 2010 4.0.30319


Which application is that? Is it using the exact same sources?
What about the GPU, is it the same when you used the Visual Profiler vs Nsight VSE?

-> The result of profiling created by my own program,
but got the same result if profiling CUDA Sample - concurrent kernel.

-> Yes, I use exact the same source to profile with Nsight and Visual Profiler

-> I have only one display card on my PC (GTX 660 Ti), so the answer is the same.

Thanks for your response.

Thanks for the info.
Is there any way I could get a hold of your project?

Hi rafi

To give you my project is OK, but I don’t know why?
If I open the project of CUDA Sample (concurrent kernel) and use Nsight to profile it,
then time line shows that kernels running serially.

Hi flysnow,

I’ve sent you a private message with instructions on sending your project to us. We will try to reproduce the issue you are seeing.