I'm not sure the timeline data is correct in my CNN inference.

jsfyffyf43 · November 20, 2019, 8:13am

I’m profiling the CNN model implemented with PyTorch. I ran inference code with image dataset through ResNet18, MobilenetV2, and I’m going to do this with other models, too.

I was trying to use a lot of profiling tool CUDA toolkit provides. And I noticed that the data I want to get is available with Visual Profiller.(e.g. Memcpy/Kernel Overlap, Kernel Concurrency is available with GPU Usage mode)

However, I’m not sure this data is reliable. My Inference code is really fast (about 10 seconds) when I ran it in my console. (which is done by the command ‘python inference.py’) However, in my time result in Visual Profiler, it said it took 349 seconds. Why it happened?

And it always said there was an error after it finish execution with the message ‘timeline options cannot be enabled for this profile data and will be ignored.’ And sometimes no result appears.

Sanjiv.Satoor · November 21, 2019, 2:33pm

My Inference code is really fast (about 10 seconds) when I ran it in my console. (which is done by the command ‘python inference.py’) However, in my time result in Visual Profiler, it said it took 349 seconds.
Is 10 seconds the application execution time? How did you measure this?
How did you measure the time of 349 seconds in Visual Profiler?
Can you check the kernel duration in Visual Profiler?

there was an error after it finish execution with the message ‘timeline options cannot be enabled for this profile data and will be ignored.’
Did you use all default options in Visual Profiler or changed any options?
Can you try if you get any error with nvprof?
$ nvprof

jsfyffyf43 · November 23, 2019, 12:26pm

@ssatoor

that 10 seconds is quite short time that I can even count it. Of course for accuracy, I’ve used linux time command also.
And 349 seconds is just the result that Visual Profiler says through timeline view.
Actual kernel duration is short. For accuracy, I have to review it in the lab. But I remember that the most portion of 349 seconds is spent with some CUDA API like cudaMalloc, cudaMemcpy, etc…
I think this issue is not major for me. I can actually get the timeline result. I’ll check it later.

Sanjiv.Satoor · November 25, 2019, 4:23am

Let us know if you have any further questions.
Thanks

jsfyffyf43 · November 25, 2019, 5:22am

My main question is the first thing in my second comment. Why Visual Profiler sees that the longer time is needed for the mentioned function than when I simply run my application without visual profiler.

mjain · December 5, 2019, 11:55am

Did you get chance to try out the nvprof
$ nvprof
Or
$ nvprof --export-profile

It’d be interesting to know where does NVVP or nvprof spend most of the time. Using the second nvprof command i.e. exporting the output to a file skips the post-processing step which can take considerable time for large profiles.

Topic		Replies	Views
NVIDIA Visual Profiler can't get result CUDA Programming and Performance	1	724	March 3, 2016
Something make me fell confuse when I use the nvprof Visual Profiler and nvprof cuda	0	749	June 4, 2022
How to explain the performance difference? CUDA Programming and Performance	7	3594	March 26, 2008
How reliable is the duration reported by nvprof? Visual Profiler and nvprof	1	1330	January 9, 2020
preview of NVIDIA Visual Profiler CUDA Programming and Performance	76	89425	May 18, 2010
Time of API calls in nvprof's output is consumed in GPU or CPU Jetson TX2	2	614	October 18, 2021
nvprof with tensorflow is suspiciously slow CUDA Programming and Performance	7	1604	January 19, 2019
Nvidia Visual Profiler Not accurate in timing Visual Profiler and nvprof cuda	0	796	July 29, 2022
kernel runs much faster when being profiled with Visual Profiler Visual Profiler and nvprof	4	4741	August 29, 2014
Cuda kernel Visual Profiler and nvprof profiling	2	70	August 25, 2025

I'm not sure the timeline data is correct in my CNN inference.

Related topics