CUDA Profiler


I have two CUDA kernels in the program. When I run it through the CUDA Visual Profiler, it reports the results for the last executed kernel. Since my kernels are independent, I was able to switch the order of invocation; again the last executed kernel is reported.
When I run the sample DCT, which has multiple kernels, all of them are reported.
What might I be missing?

edit : Looks like cudaThreadExit(), which was called after each kernel invocation, is the causing this behavior. Could anyone confirm this, please ?

If the CUDA visual profiler reports only the last executed kernel, It means that your program only runs with the last kernel (not execute the first kernel). The CUDA visual profiler reports all executed kernels. So you should confirm that if your program had run two kernels correctly.

If I remove the cudaThreadExit() call after the first kernel; the profiler reports on both kernels. By the way, I am using 2.2.05 profiler on XP.