NVIDIA Triton Inference Server Boosts Deep Learning Inference

jwitsoe · September 13, 2018, 3:47am

Originally published at: https://developer.nvidia.com/blog/nvidia-serves-deep-learning-inference/

You’ve built, trained, tweaked and tuned your model. You finally create a TensorRT, TensorFlow, or ONNX model that meets your requirements. Now you need an inference solution, deployable to a datacenter or to the cloud. Your solution should make optimal use of the available GPUs to get the maximum possible performance. Perhaps other requirements also…

anon42831437 · February 26, 2019, 11:18am

Greetings.
I see you use the V100 to do the demonstration. Concurrency: 8, 2413 infer/sec, latency 26473 usec. It gains a lot of performance when set the instance_group to 8.
Do you have the same experiment on Pascal gpu? say P40? Will that help too?

anon6005221 · July 8, 2019, 11:11am

Would .pb files work for caffe2 instead of the .netdef model files?

anon941035 · July 8, 2019, 7:18pm

According to one of the authors:

By .pb files I think you mean TensorFlow saved-model format. TRTIS supports many different model formats including TensorFlow saved-model. You wouldn't use a .pb file with Caffe2 since it is not a Caffe2 model format, but TRTIS supports .pb (saved-model) as well as netdef. The full list of supported formats can be found in the GitHub README and linked documentation: https://github.com/NVIDIA/t...

You might also try posting your question on the NVIDIA TensorRT devtalk forum: https://devtalk.nvidia.com/...

anon6005221 · July 9, 2019, 6:21pm

Why I had the doubt about Caffe2 + .pb files is, the model zoo provided by Caffe2 has all .pb files. Here: https://github.com/caffe2/m...

(Dumb question since I'm new to all this) I'm guessing if both Caffe2 and TF can generate .pb files, it doesn't mean both models would have the same structure right, meaning the a tensorflow model saved as .pb is not equivalent to a Caffe2 model saved as .pb (considering the model itself to be the same)

Also I am unable to find any tutorial on how to save as .netdef in Caffe2. So I'm scratching my head at this point?

Thanks Loyd

wangqg1 · February 18, 2021, 3:26am

If the model repository contains a lot of models which can not be accommodated by a GPU at the same time (probably due to GPU memory limit), is there a scheduling policy to load/unload models dynamically? If so, what is the impact to inference latency if a request hits an unloaded model?

David_Goodwin · February 22, 2021, 6:28pm

You will get quicker responses if you ask your questions in the github project: github.com/triton-inference-server

Triton does not automatically load and uinload models. But you can use the model control API to manually load and unload models. See https://github.com/triton-inference-server/server/blob/master/docs/model_management.md

Topic		Replies	Views
TensorFlow 2 support Triton Inference Server - archived	2	1071	July 16, 2020
Do Inference Server support Caffe and Caffe custom layer? Triton Inference Server - archived	3	798	November 15, 2019
Can I build and use nvcaffe on TX2? Jetson TX2	7	1797	October 18, 2021
Run tensorflow savedmodel on nvinferserver in DS DeepStream SDK inference-server-triton , deepstream	9	508	August 26, 2022
Deploy tensorflow model on deep stream DeepStream SDK	2	397	October 12, 2021
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server DeepStream SDK	3	8913	February 29, 2024
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server Technical Blog	13	1182	May 25, 2022
04_video_dec_trt How to use the pytorch model (.pt) in the sample? Jetson Xavier NX tensorrt , pytorch	8	1114	October 18, 2021
How to transform tensorflow checkpoint model(.meta .index .data) to frozen model (.pb) TensorRT	5	7957	October 12, 2021
Triton server inference model placement TAO Toolkit	7	962	February 23, 2022

NVIDIA Triton Inference Server Boosts Deep Learning Inference

Related topics