I'm not sure the timeline data is correct in my CNN inference.

I’m profiling the CNN model implemented with PyTorch. I ran inference code with image dataset through ResNet18, MobilenetV2, and I’m going to do this with other models, too.

I was trying to use a lot of profiling tool CUDA toolkit provides. And I noticed that the data I want to get is available with Visual Profiller.(e.g. Memcpy/Kernel Overlap, Kernel Concurrency is available with GPU Usage mode)

However, I’m not sure this data is reliable. My Inference code is really fast (about 10 seconds) when I ran it in my console. (which is done by the command ‘python inference.py’) However, in my time result in Visual Profiler, it said it took 349 seconds. Why it happened?

And it always said there was an error after it finish execution with the message ‘timeline options cannot be enabled for this profile data and will be ignored.’ And sometimes no result appears.

My Inference code is really fast (about 10 seconds) when I ran it in my console. (which is done by the command ‘python inference.py’) However, in my time result in Visual Profiler, it said it took 349 seconds.
Is 10 seconds the application execution time? How did you measure this?
How did you measure the time of 349 seconds in Visual Profiler?
Can you check the kernel duration in Visual Profiler?

there was an error after it finish execution with the message ‘timeline options cannot be enabled for this profile data and will be ignored.’
Did you use all default options in Visual Profiler or changed any options?
Can you try if you get any error with nvprof?
$ nvprof

@ssatoor

  1. that 10 seconds is quite short time that I can even count it. Of course for accuracy, I’ve used linux time command also.
    And 349 seconds is just the result that Visual Profiler says through timeline view.
    Actual kernel duration is short. For accuracy, I have to review it in the lab. But I remember that the most portion of 349 seconds is spent with some CUDA API like cudaMalloc, cudaMemcpy, etc…

  2. I think this issue is not major for me. I can actually get the timeline result. I’ll check it later.

Let us know if you have any further questions.
Thanks

My main question is the first thing in my second comment. Why Visual Profiler sees that the longer time is needed for the mentioned function than when I simply run my application without visual profiler.

Did you get chance to try out the nvprof
$ nvprof
Or
$ nvprof --export-profile

It’d be interesting to know where does NVVP or nvprof spend most of the time. Using the second nvprof command i.e. exporting the output to a file skips the post-processing step which can take considerable time for large profiles.