I’ve encountered a warning, followed by incomplete output in the CUDA Visual Profiler. I have been attempting to profile a streaming cuFft application on a Tesla C1060 using CUDA 4.0. As the attached test case indicates, when the second cudaMemcpyAsync call is removed the profiler produces correct output, however, when it is included I receive a warning:
In this profiling session some profiler output rows are dropped due to
incorrect gpu time stamp values and the profiler output is incomplete.
Context_0:
Number of rows dropped = 1
followed then by:
In this profiling session some profiler output rows are dropped when
combining profiler output from multiple runs. Only the initial matching
profiler output rows are kept. So the profiler output is incomplete.
This can happen when the application execution differs across
multiple runs.
Context_0:
Number of rows in first run = 5
Number of rows dropped = 2
The message area also indicates that runs 2 through 8 give 6 rows rather than the 5 as reported in run #1.
I was searching for the same message and came to this post. I encountered the same problem with profiling OpenCL application in Compute Visual Profiler. All program runs starting from 2nd have about 700 rows. The 1st run is stripped to 500 rows.
GTX 560Ti, Windows Ultimate 64bit, Compute Visual Profiler version 4.0.17
I have also been receiving the same warning about dropped rows, which makes the profile essentially useless.
In this profiling session some profiler output rows are dropped when combining profiler output from multiple runs. Only the initial matching profiler output rows are kept. So the profiler output is incomplete. This can happen when the application execution differs across multiple runs.
Context_0:
Number of rows in first run = 1543
Number of rows dropped = 1522
I don’t know of any reason that the execution would be substantially different between runs.
I’m using a Quadro 5000 on Ubuntu Server 10.10 with CUDA 4.0
Try increasing the timeout limit of your session. I’ve found that some of the profiling runs take longer than others (in some cases, much, much longer), and if they time out then the data from those runs will be lost.
For example, a program that I had that normally ran in under a minute took over 2000 seconds to finish for one of the runs.
CUDA Toolkit 4.0 Please note that a Visual Profiler Patch to fix this issue has been posted for Linux. It is available on NVIDIA Developer Zone : . Look under the Linux downloads section on the page (search for “Visual Profiler Patch”).
This patch is specifically to address the Visual Profiler issue for profiling applications using multiple streams for the case when Visual Profiler reports an error:
“In this profiling session some profiler output rows are dropped due to incorrect gpu time stamp values and the profiler output is incomplete.”
If you have CUDA Toolkit version 4.0.17 to install the patch:
Rename the existing Visual Profiler executable
cd $TOOLKIT_DIR/computeprof/bin
($TOOLKIT_DIR points to the directory under which the CUDA Toolkit
version 4.0.17 is installed)
mv computeprof computeprof.4.0.17
Install the new Visual Profiler executable from the patch
cd $TOOLKIT_DIR/computeprof/bin
tar xvf visualprofiler_4.0.51_linux*.tar.gz
GTX 460, CUDA 4.0,Ubuntu 11.04, Driver 270.41.19.
The dropped rows problem was not resolved for me after installing the patch.
The problem only seems to occur if Timestamp is checked (Session Settings->Other Options->Timestamp).