Agx Orin - Triton inference

jdelvaux · April 11, 2024, 7:17am

Hello,

I’d like to understand more about how Triton GRPC works. I’ve seen huge usage of network bandwith (around 3Gbps; i’m using 10Gbps switch) for the inference of a segmentation model with an input video stream of 1080p 30FPS video.
I’m trying to be able to make the inference on another Orin to run as smooth as possible.

On a single Orin, I have around 50-55FPS and when running the inference on another Orin (pipeline deepstream on Orin 1; Triton server on Orin2) I have about 20-25 FPS.
I’m trying to increase that number so I need to understand more about GRPC and bottlenecks.

fanzh · April 12, 2024, 1:58am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

• DeepStream Version

• JetPack Version (valid for Jetson only)

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs)

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

jdelvaux · April 12, 2024, 6:22am

I’d like to understand what is the content of the grpc exchanges between a triton client and the server. I’ve noticed quite a a large usage of the bandwith and I’d like to know how to reduce and optimize it.

fanzh · April 13, 2024, 6:48am

In grpc model, DeepStream will send the tensors to triton server for inference and get inference results by grpc protocol. please refer to the deepStream doc and triton doc.

jdelvaux · April 15, 2024, 6:32am

Hello fanzh,

I would like to emphasize that I’m on very particular case where I want to perform inference on another Orin.
I need to optimize the bandwith (+ … ? ) to be able to increase the inference speed.

fanzh · April 15, 2024, 3:19pm

what is the different configurations between test(50-55FPS) and test(20-25 FPS)? is test(50-55FPS) using grpc mode"?

jdelvaux · April 16, 2024, 5:58am

Test 1 : Deepstream + Triton Inference on a single Orin: 50-55FPS
Test 2 : Deepstream (Orin1) + Triton Inference (Orin 2): 20-25FPS

In both cases, I’m using GRPC with Triton server.
And my config in deepstream is the same. Yes, I’m using “enable_cuda_buffer_sharing=true”.

fanzh · April 16, 2024, 6:45am

are you using custom code? or which deepstream sample are you testing? could you share the configuration file and two whole logs? wondering the source type and sink type.
please make sure the two Orin have the same setting. please refer to this topic .

jdelvaux · April 17, 2024, 7:10am

are you using custom code? or which deepstream sample are you testing? could you share the configuration file and two whole logs? wondering the source type and sink type.

Yes, I’m using a custom code but it should be similar with the deepstream samples.
The project is under NDA, I’m not able to share it here in public. Is there another way to share it?

please make sure the two Orin have the same setting. please refer to this topic .

Yes, they are using the same settings.

fanzh · April 17, 2024, 7:37am

“enable_cuda_buffer_sharing=true” is not an acceleration feature for Jetson. please refer to the doc. so the only one difference of two tests is the network transmitting.
are you using the local file or rtsp stream? why do you need to deploy grpc model in two machines? since the tensor is not compressed. trtion inference API AsyncInfer supports compression, but there is no fps improvement after testing.
for a higher fps, you can use interval to skip some frame inferences if not all frames need to be inferred. please find “interval” in the doc.

jdelvaux · April 17, 2024, 12:58pm

“enable_cuda_buffer_sharing=true” is not an acceleration feature for Jetson. please refer to the doc. so the only one difference of two tests is the network transmitting.

Good to know but from the doc it was not clear.

are you using the local file or rtsp stream? why do you need to deploy grpc model in two machines? since the tensor is not compressed. trtion inference API AsyncInfer supports compression, but there is no fps improvement after testing.

The input stream is a local file.
As for the why, I’d be happy to discuss it outside of this forum.

for a higher fps, you can use interval to skip some frame inferences if not all frames need to be inferred. please find “interval” in the doc.

I’ll look into this and see if it is applicable .

fanzh · April 22, 2024, 2:24pm

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

jdelvaux · April 22, 2024, 2:32pm

Hello fanzh,

That’s not a “real issue” but more performance advice/improvement.

Are there any other solutions ?
Maybe, I’ll rephrase : how is it done on dGPU ? Are there any tricks to increase when triton server is not the same one as deepstream ?

fanzh · April 24, 2024, 9:27am

Deepstream nvinferserver and triton code are opensource. the main workflow is the same except the feature enable_cuda_buffer_sharing on dgpu. please refer to nvinferserver doc and triton doc. if using grpc model on Jetson, then tensors transmitting is the bottleneck.

jdelvaux · April 29, 2024, 7:53am

Is the HTTP method more optimized ?
Or what should be my alternatives ?

fanzh · April 29, 2024, 9:15am

triton supports http method, but nvinferserver only support native and grpc mode, please refer to the doc.

jdelvaux · May 6, 2024, 9:34am

Ok, then I think I’ll have to change the way to do things to mitigate this issue.

Thanks for the information.

system · May 20, 2024, 9:34am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
AGX Orin - Optimisation of GPU usage Jetson AGX Orin gpu-computing	20	790	May 22, 2024
Triton on Jetson Orin Jetson AGX Orin	12	1540	January 4, 2024
DeepStream Triton gRPC example does not run with Deepstream Triton Docker images DeepStream SDK	12	1091	January 17, 2023
Running deepstream_imagedata-multistream_cupy on Jetson Orin – Alternatives to x86 Code DeepStream SDK cupy , deepstream	18	71	December 30, 2024
Running Inference on AGX GPU Jetson AGX Orin tensorrt	7	584	July 4, 2024
Detection result is different between Xavier and Orin for the same model and weights DeepStream SDK	27	1674	June 8, 2023
GRPC Data Corruption/Issue with Yolo Object Detection with Triton on Jetson DeepStream SDK	20	601	June 25, 2024
Performance Discrepancy - Python API vs. trtexec on Jetson AGX Orin Board Jetson AGX Orin jetson-inference	8	689	July 10, 2023
Discrepancies in Inference Confidence between Triton Server and Jetson TX2 NX for object detection model with oriented bounding boxes DeepStream SDK tensorrt , camera , jetson-inference , gstreamer , python	11	695	September 28, 2023
Emulating an NVIDIA Jetson Orin NX Using the NVIDIA Jetson AGX Orin Developer Kit Technical Blog	3	696	May 1, 2024

Agx Orin - Triton inference

Related topics