Hello, NVIDIA
I am utilizing the excellent features of the NVIDIA Triton Inference Server. In particular, I am utilizing the Model Analyzer in Triton Inference Server to estimate the approximate throughput for models when using GPUs.
I find it very convenient as it gives different results and plots depending on which GPU it is run on. However, when performing many configuration options for a specific model, it can sometimes take several hours to 1-2 days.
Therefore, I was wondering if there is a way to obtain numerical results without running it on a GPU. Is it absolutely necessary to perform this kind of task?