I’m trying to profile my CUDA application, and the nvvp program only collects a subset of stats from the 2nd and later kernels to be run during the program. Why is this, and how can I get stats for the later kernels?
OS: Ubuntu 11.10
Driver Version: 295.20-0ubuntu1~oneiric~xup1
CUDA Toolkit: 4.1