Inference on NN onnx model by using NVIDIA profiler

scifiominus · July 23, 2023, 1:13pm

I am trying to inference on NN model from ONNX zoo and while I am doing profiling with onnxruntime with just on image, I am getting for two convolution layers extremely high throughput.

While I am running with NVIDIA profiler, the duration is closer to the expected values. I mean these two layers are behaving as expected they need more runtime to do the operations.

Conv Layer 486 kernel time = 1751530 microseconds, input_dim = 1x256x264x200, kernel_filter = 256x256x3x3, output_dim = 1x256x264x200, Throughput = 35.58 GFLOPS
Conv Layer 490 kernel time = 1741836 microseconds, input_dim = 1x256x264x200, kernel_filter = 256x256x3x3, output_dim = 1x256x264x200, Throughput = 35.77 GFLOPS

Now if I am not run the NVIDIA profiler, these two layers having excessive throughput and they are much faster on the runtime than layers with smaller input_dims, output_dims and kernel_filter:

Conv Layer 486 kernel time = 158 microseconds, input_dim = 1x256x264x200, kernel_filter = 256x256x3x3, output_dim = 1x256x264x200, Throughput = 406 TFLOPS
Conv Layer 490 kernel time = 110 microseconds , input_dim = 1x256x264x200, kernel_filter = 256x256x3x3, output_dim = 1x256x264x200, Throughput = 583 TFLOPS

What is happening inside of the GPU when I am using NVIDIA profiler ?
What is happening on the GPU without using any profiling tool and executes bigger layers faster than the smaller?
Why I am getting excessive TFLOPS when my Theoretical Peak is:
Single Precision FLOPs: 7.046 TFLOPs
What is the right way to estimate the FLOPs. Is there any special factor to divide my estimated calculations?

CUDA Version: 12
Driver Version: 525.125.06
OS: Linux
GPU: GTX 1070

Topic		Replies	Views
kernel runs much faster when being profiled with Visual Profiler Visual Profiler and nvprof	4	4690	August 29, 2014
Profiling CUDA Programming and Performance	0	489	August 13, 2015
Huge data file is generated by Nsight Nsight Compute	7	933	October 23, 2019
Runtinme occupancy CUDA Programming and Performance	5	1850	January 9, 2009
Profiling CUDA Programming and Performance	2	827	August 17, 2015
NVIDIA Visual Profiler 5.5: Visual Profiler and nvprof	2	3204	January 2, 2014
Interpreting OpenCL Visual Profiler Results CUDA Programming and Performance	4	2243	June 10, 2010
Profiler speeding up my kernels? Nvidia employees please read Weird timing behavior during profiler CUDA Programming and Performance	6	5819	November 9, 2009
nv-nsight-cu-cli profiles every kernel 47x, is very slow Profiling Linux Targets	2	1129	October 12, 2021
Problem with nvprof CUDA Programming and Performance	3	594	November 13, 2017

Inference on NN onnx model by using NVIDIA profiler

Related topics