Visual Profiler Kernel Call Count Bug? undercounting kernel calls


The CUDA Visual Profiler is under-counting the number of calls for at least one of my kernels from several tens or hundreds of calls to 2 or 4 . Also the number is inconsistent between profiler sessions although the program is executing in exactly the same manner.

Any one else experiences this?

Any ideas?



The profiler output is written out to the file per CUDA context when there is a context synchronize. In case your application does not terminate properly & the context synchronize does not happen the profiler output could be incomplete. Could you attach the CSV output for two different sessions which shows this problem?