I have created a TensorRT optimized image classification model and would like to deploy it to our own locally hosted GPU-enabled server so that multiple desktop client workstations can make inference calls to it in a production setting. The only ways I have found to deploy a TensorRT model this way is to use DGX-1, DGX Station, or the NVIDIA GPU Cloud. All of these options cost tens or hundreds of thousands of dollars, which is way out of our budget. Is there a cost-effective way to deploy TensorRT models to our own dedicated server?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Real Time Inference with Multi GPU - Multiple Model | 1 | 1386 | January 29, 2020 | |
How to specify the GPU to do the inference when there are multiple GPUs installed? | 0 | 527 | June 13, 2019 | |
Is there any other way besides TensorRT to increase the GPU utilization while doing the inference? | 1 | 743 | March 25, 2019 | |
RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud | 0 | 248 | August 21, 2022 | |
TensorRT Inference Server on Windows | 1 | 689 | April 9, 2019 | |
Windows support TensorRT inference engine | 1 | 1493 | March 5, 2018 | |
Using Multiple GPUS | 0 | 514 | August 20, 2019 | |
TensorRT on Windows | 1 | 2991 | March 2, 2018 | |
Desktop GPU comparable/better to jetson-Nano GPU | 3 | 2025 | August 16, 2021 | |
NVIDIA TensorRT Inference Server Available Now | 0 | 271 | August 21, 2022 |