Slow inference using CUDA and PyTorch on Jetson AGX

manhnd1 · November 2, 2021, 2:00am

Hi NVIDIA,

We’re having a bit of a hiccup with using pytorch + CUDA on Jetson AGX. We have two object tracking models, used on all other devices with the same execution time, but when used on AGX, the 1st model runs perfectly fine and has very fast processing speed, however the 2nd model seems to be ~8x slower. Is the problem I’m facing the same as this post Jetson AGX Xavier: slow inference using CUDA and PyTorch? And is there any way I can do it quickly?

njuffa · November 2, 2021, 2:13am

The last recommendation from NVIDIA in the thread you are pointing to was to install the CUDA profiler and see what it diagnoses the top bottleneck to be. Did you do that and if so, what were the results?

What are “all other devices”? Are they discrete GPUs or integrated solutions like the Jetson AGX? If it is the former, that would seem to have no bearing on this case, as it would be comparing apples and oranges. If it’s the latter (i.e. other integrated platforms), that could be quite relevant and you might want to list here what integrated platforms work well for this use case.

manhnd1 · November 2, 2021, 3:22am

used on all other devices → I was tested on GPU 1660, 1660supper, 2070 and Jetson NX. All working normally.
But when I bring my torch_jit_model into Jetxon AGX, or RTX 3060, my model 2 run infer slower ~x8 time with model 1.
I am trying to use Nsight Systems to check the result, but I dont have any experience with this, so I can’t understand what the insight system returns .
Here is the results of Nsight Systems (running on 3060 - I believe it has the same behavior with Jetson AGX)

The first 8 seconds are when I run model 1, and the later times are when I run model 2

manhnd1 · November 2, 2021, 3:26am

manhnd1 · November 2, 2021, 3:27am

Topic		Replies	Views
Jetson AGX Xavier: slow inference using CUDA and PyTorch Jetson AGX Xavier cuda , pytorch	4	1611	October 18, 2021
[TensorRT] Model inferencing speed reduction on Jetson Xavier AGX using 2 models Jetson AGX Xavier tensorrt , jetson	5	503	November 24, 2021
TRT inference speed on two AGX Xavier TensorRT	1	305	September 12, 2021
Why is torch.tensor.to('cuda') so slow? Jetson AGX Orin pytorch	5	41	December 9, 2024
Inference speed optimization on Jetson AGX Jetson AGX Xavier jetson-inference	3	911	February 9, 2022
Nvidia Jetson AGX Orin not able to use GPU Jetson AGX Orin cuda , yolo	6	47	December 31, 2024
Performance of l4t-pytorch on cuda and cpu Jetson AGX Xavier pytorch	5	659	August 28, 2021
Simple CUDA example 4x slower on Xavier AGX GPU than CPU Jetson AGX Xavier cuda	3	498	October 18, 2021
Pytorch network -> onnx -> tensorrt performance(run frequency) question Jetson AGX Xavier tensorrt	7	459	December 7, 2023
Jetson AGX Xavier GPU RAM usage for object detection and instance segmentation inferencing Jetson AGX Xavier tensorrt , jetson-inference , pytorch , onnx	2	884	May 13, 2022

Slow inference using CUDA and PyTorch on Jetson AGX

Related topics