CUDA Visual Profiler Dies During Long Programs

Eric3918 · July 28, 2010, 12:41pm

I am trying to figure out at what point as I increase my simulation size does global memory access bandwidth become a limiting factor. The problem is, CUDA profiler freezes with increasing liklihood the larger I make my simulation. For run times longer than 1 minute, there is virtually no chance the profiler will finish without freezing. Is this a known problem? This is extremely frustrating and I need this information as soon as possible.

Eric3918 · August 4, 2010, 12:27pm

Bumpety bump bump.

Michael_Fried · August 5, 2010, 3:42pm

My approach to measuring performance is to use the API calls that tell me the starting and ending time of a kernel. By running the kernel several times and dumping the performance values from the main program via printf’s or cout’s to the console, I get information that describes how the kernel performs varying the size of the inputs or some parameter, etc.

Here is an example:

Device Name, Size (bytes), Width (pixels), Best runtime (ms), Average runtime (ms), Worst runtime (ms),Average Perf (bytes/s)

Tesla C2050,	262144,  256,  0.111,  0.112,  0.116, 2.34E+09

Tesla C2050,	414736,  322,  0.128,  0.135,  0.141, 3.07E+09

Tesla C2050,	659344,  406,  0.232,  0.232,  0.233, 2.84E+09

Tesla C2050,   1048576,  512,  0.308,  0.308,  0.308, 3.40E+09

Tesla C2050,   1664100,  645,  0.393,  0.393,  0.395, 4.23E+09

Tesla C2050,   2637376,  812,  0.867,  0.868,  0.870, 3.04E+09

Tesla C2050,   4194304, 1024,  1.181,  1.181,  1.182, 3.55E+09

Tesla C2050,   6656400, 1290,  1.771,  1.773,  1.778, 3.75E+09

Tesla C2050,  10562500, 1625,  2.467,  2.470,  2.472, 4.28E+09

Tesla C2050,  16777216, 2048,  3.021,  3.022,  3.024, 5.55E+09

Tesla C2050,  26625600, 2580,  5.116,  5.118,  5.119, 5.20E+09

Tesla C2050,  42250000, 3250,  7.647,  7.648,  7.650, 5.52E+09

Tesla C2050,  67108864, 4096,  9.925,  9.930,  9.933, 6.76E+09

Tesla C2050, 106502400, 5160, 16.941, 16.944, 16.946, 6.29E+09

Tesla C2050, 169052004, 6501, 22.806, 22.916, 22.987, 7.38E+09

Tesla C2050, 268435456, 8192, 39.718, 39.757, 39.826, 6.75E+09

Graphing the resulting performance in terms of time vs size yields this graph:

Graphing the resulting performance in terms of GB/s vs size yields this graph:

Yes, this isn’t exactly a profiler, but it helps to explore running the problem and find out specific issues with running a kernel at specific data sizes, and observing the numbers can help determine how scalable a given algorithm is when running on a given GPU.

And even if the profiler doesn’t work on your problem, if your problem is still running without the profiler, you can always measure its performance with the CPU’s timer if the GPU timing code isn’t working. You just need to run the code multiple times and compare the best and worst times to figure out the variance and determine how accurate and precise your measurements are (obviously, using CPU wall clock time vs GPU kernel time measures other overhead in the system, but this overhead isn’t optional, so it’s useful to include it in your performance measurements).

I hope this information is useful.

-Mike

Topic		Replies	Views
Profiler speeding up my kernels? Nvidia employees please read Weird timing behavior during profiler CUDA Programming and Performance	6	5819	November 9, 2009
CUDA Profiler documentation Few questions and some interesting facts CUDA Programming and Performance	5	6132	July 20, 2009
Issues about CudaProfiler analysis Gpu Idle, missing kernel analysis topics CUDA Programming and Performance	2	7828	June 22, 2011
Visual Profiler makes bandwidth 6x faster ?!? CUDA Programming and Performance	4	1168	February 18, 2015
when did I reach max. possible speed? is there a way to know? CUDA Programming and Performance	6	2082	December 26, 2008
Peaks and slow performance with cudaDeviceSynchronize CUDA Programming and Performance cuda	6	2774	November 17, 2021
Profiler timings vs. real world timings. VERY different... CUDA Programming and Performance	8	2408	May 15, 2009
Profiling my code I need some help to understand the output of the visual profiler CUDA Programming and Performance	5	1862	February 3, 2012
Understanding the memory latency when using CUDA profiler vs cudaEventRecord CUDA Programming and Performance	9	2082	November 11, 2010
Profiling in a code line resolution CUDA Programming and Performance	7	7057	December 6, 2011

CUDA Visual Profiler Dies During Long Programs

Related topics