Estimating inference and training time of a neural network on GPU

utilisateur1815 · January 3, 2022, 4:25pm

Hi, I would like to estimate the inference time in a neural network using a GPU. Is there a formula that gives you the inference time knowing the FLOPs of the Neural Network, the number of Cuda Cores and the frequency of the GPU ? Does a similar formula exists for training time ?

Are there other GPU caracteristics that matters in estimating inference or training time ?

Is it reasonnable to estimate that running an inference on a GPU that has 4000 CUDA cores goes twice faster than on a GPU with only 2000 CUDA cores ?

Cheers,
Louis

jkrinitt · February 5, 2022, 4:03pm

There are many aspects that impact the performance during training and inference. One big factor in inference times is the precision you’re using. The big data GPUs like the A100 you have more FP64 performance available. If your target is GeForce or workstation cards, you will get better performance using single precision or half precision. Please also keep in mind that beyond the specs on the CUDA cores, the Tensor Cores on the GPU are specifically suited to training and inferencing of AI workloads. Here is a blog post about precision as it relates to inferencing performance:

Also, Nvidia offers some tools that help with model design and performance. Check out DL Designer here:
https://developer.nvidia.com/nsight-dl-designer

I realize I have complicated the question and I’m afraid there is not a simple answer to your question. I hope that the DL designer can help to plan for inference performance. Nvidia engineers have continued to create optimizations both in direct performance and in other areas of the inference process that you can take advantage of that can have dramatic impacts on performance of your models. Capabilities such as sparsity, when used as a part of the model training through to inference on compatible GPUs can provide a nice boost in performance. Here is a post about sparsity and how it, along with precision, can impact performance.
https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/

system · April 5, 2022, 5:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to find out GPU time for executing a particular block of code? CUDA Programming and Performance	9	1968	March 28, 2024
How to predict time taken for processing? CUDA Programming and Performance	10	2043	June 9, 2010
Why my inference time is so long when using trtexec - FP16? Jetson TX2 jetson-inference	4	1958	October 18, 2021
Inference Speed Spikes When Running FP16 Converted ONNX Model with TensorRT TensorRT cudnn	1	49	January 31, 2025
How do I know if my GPU will support a convolutional neural network? GPU-Accelerated Libraries cudnn	6	1441	April 6, 2023
Measuring speed of a calculation in a single thread CUDA Programming and Performance	6	1130	March 2, 2011
the inference time increases linearly when running more than 2 tensorrt instance on single GPU TensorRT	1	1572	April 4, 2019
The impact of network input data types on inference speed and accuracy ！ TensorRT tensorrt	1	437	July 4, 2022
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	762	March 13, 2023
High inference time while running UNet with INT8 precision TensorRT tensorrt	5	986	February 10, 2021

Estimating inference and training time of a neural network on GPU

Related topics