Hi, I would like to estimate the inference time in a neural network using a GPU. Is there a formula that gives you the inference time knowing the FLOPs of the Neural Network, the number of Cuda Cores and the frequency of the GPU ? Does a similar formula exists for training time ?
Are there other GPU caracteristics that matters in estimating inference or training time ?
Is it reasonnable to estimate that running an inference on a GPU that has 4000 CUDA cores goes twice faster than on a GPU with only 2000 CUDA cores ?
There are many aspects that impact the performance during training and inference. One big factor in inference times is the precision you’re using. The big data GPUs like the A100 you have more FP64 performance available. If your target is GeForce or workstation cards, you will get better performance using single precision or half precision. Please also keep in mind that beyond the specs on the CUDA cores, the Tensor Cores on the GPU are specifically suited to training and inferencing of AI workloads. Here is a blog post about precision as it relates to inferencing performance:
Also, Nvidia offers some tools that help with model design and performance. Check out DL Designer here:
I realize I have complicated the question and I’m afraid there is not a simple answer to your question. I hope that the DL designer can help to plan for inference performance. Nvidia engineers have continued to create optimizations both in direct performance and in other areas of the inference process that you can take advantage of that can have dramatic impacts on performance of your models. Capabilities such as sparsity, when used as a part of the model training through to inference on compatible GPUs can provide a nice boost in performance. Here is a post about sparsity and how it, along with precision, can impact performance.