Examples for Deployment of and Inference with Pretrained Custom PyTorch-Based Models on Jetson Orin Nano

araman3 · April 10, 2025, 6:41pm

Hello,
I am a beginner to embedded programming and systems like Jetson but need to deploy my custom built and trained Deep learning segmentation model on a Jetson Nano Orin, for integration with a custom imaging platform.

I have written and trained my own pseudo-segmentation model using PyTorch and created a weights file, and have a general inference script for sending test images to the trained model and getting output masks and evaluation metrics. I am now trying to deploy the model on my Jetson Orin Nano (Jetpack 6.2, Ubuntu 22.04) but am having trouble finding good examples or tutorials on how to convert and optimize my weights (in .pth format currently) and files (all preprocessing, model architecture and inference scripts are .py at the moment). Can someone point me to good examples or workflows of deploying a PyTorch model on the new 6.2 Jetpack Jetson Orin Nano ?

AastaLLL · April 11, 2025, 3:01am

Hi,

PyTorch is supported on the Jetson.
So you can try to deploy it first. You can find the package info on the below wiki page:

Next, you can try to convert the model into TensorRT for acceleration.
Some examples can be found below:

A CLI tool to convert the model is also included in the JetPack 6.2 (need the onnx model).

$ /usr/src/tensorrt/bin/trtexec --onnx=[model]

Thanks.

araman3 · April 11, 2025, 7:09pm

Hi ,
Thank you for the info and thank you for verifying Pytorch compatibility. But if i just deploy the inference script from my CLI, how will the jetson know to use the GPU and not the CPU? How can I ensure that the GPU is being used, and not the CPU?

Or is it really as simple as uploading my script, architecture, and weights to the Jetson Orin Nano, then running my inference script from the CLI and it will automatically know to use the GPU for inference without me doing anything extra?

AastaLLL · April 14, 2025, 5:35am

Hi,

You can control the layer placement with PyTorch API.
For example, in model = NeuralNetwork().to(device).
The device variable can be cuda or cpu.

We also have a system monitor for you to check the GPU utilization.

$ sudo -H pip3 install -U jetson-stats
$ jtop

Thanks.

araman3 · April 18, 2025, 7:39pm

Hi,
I have run the model and specified cuda as the device. I also used the jtop command to monitor GPU usage and it says it is working. I have noticed 2 big issues:

Inference is still running exceedingly slowly. I am trying to run inference with my trained weights on a dataset of about 1600 images (tif file format). It estimates to take almost 10 minutes to finish running the inference task.
I keep getting a warning “System throttled due to overcurrent”. I stopped my run early because the Jetson became very hot around the time of this warning. I havent converted to tensorrt format yet, but the overheating doesnt seem good regardless.

What am i supposed to do to increase performance and not have the system overheat? Could you please assist with this?

To provide more info, i am using the provided cord with the Jetson Orin Nano Dev Kit for power, and also starting in MAXN SUPER mode. I am plugged into an extension cord plugged into standard wall power. I dont know if this matters.

AastaLLL · April 21, 2025, 7:30am

Hi,

Could you share the GPU utilization percentage with us?
And the output from tegrastats:

$ sudo tegrastats

Throttling is a mechanism to protect your system by reducing the processor’s clock.
So it’s okay to see it and you can turn the warning off with the suggestion below:

Thanks.

araman3 · April 22, 2025, 7:36pm

I used jtop to see GPU usage during the inference process and saw that it would usually say nearly 99% GPU, with occasional drops to much lower numbers like .3% or 22%. As the inference process runs for longer, I notice that GPU utilization strays from near 100% usage more often

tegrastats returns the following output every second with slight changes to values:
04-22-2025 15:29:13 RAM 5193/7620MB (lfb 1x4MB) SWAP 2104/3810MB (cached 0MB) CPU [13%@1728,11%@1728,10%@1728,12%@1728,9%@1344,61%@1344] EMC_FREQ 25%@3199 GR3D_FREQ 0%@[1014] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@54.437C soc2@53.281C soc0@53.281C gpu@56.093C tj@56.093C soc1@53.375C VDD_IN 13395mW/12105mW VDD_CPU_GPU_CV 5582mW/4402mW VDD_SOC 3284mW/3444mW

AastaLLL · April 23, 2025, 9:10am

Hi,

Based on the log, the GPU utilization is 0%. GR3D_FREQ 0%@[1014].
Are you able to share the log with a longer period so we can know more about the behavior.

If you are seeing the GPU utilization drop to zero sometimes.
It indicates that the GPU need to wait for the input data (idle) and still have zoom for acceleration.

Thanks.

araman3 · April 23, 2025, 5:22pm

Hi,
My apologies I should have posted more:

04-23-2025 13:19:15 RAM 6013/7620MB (lfb 2x1MB) SWAP 40/3810MB (cached 0MB) CPU [9%@1728,15%@1728,22%@1728,7%@1728,0%@1036,35%@1036] EMC_FREQ 22%@3199 GR3D_FREQ 99%@[1016] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@50.687C soc2@49.187C soc0@48.906C gpu@51.843C tj@51.843C soc1@49.187C VDD_IN 11099mW/15378mW VDD_CPU_GPU_CV 3883mW/6021mW VDD_SOC 3025mW/4422mW
04-23-2025 13:19:16 RAM 6020/7620MB (lfb 4x512kB) SWAP 40/3810MB (cached 0MB) CPU [13%@729,12%@729,27%@729,31%@729,4%@729,10%@729] EMC_FREQ 37%@3199 GR3D_FREQ 99%@[1008] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@51.125C soc2@49.812C soc0@49.343C gpu@53C tj@53C soc1@49.218C VDD_IN 16876mW/15477mW VDD_CPU_GPU_CV 6681mW/6065mW VDD_SOC 4833mW/4449mW

It does say 99%, but as you saw, there are moments of 0% or very low GPU usage even after data has been loaded and only inference is running. Are you saying it is an issue of data processing prior to input to the model?

AastaLLL · April 24, 2025, 7:43am

Hi,

For example, if you get the GPU utilization as 0%, 99%, 0%, 99%, …
This might indicate the GPU needs to wait (idle) for the input to process.
(the app is IO bound instead of computational bound)

So to improve such a pipeline, you can try to move the preprocessing (bottleneck) step to a more powerful hardware (ex. GPU or VIC) to see if it helps.

Thanks.

araman3 · April 25, 2025, 7:33pm

Ok i understand.

As far as deploying the inference TensorRT engine. how do i deploy my model.trt file after creating it? I have run it according to some examples online but it is still inferring on my test set at the same speed as before (20min), with the same on-off GPU utilization. Does this suggest an issue with my inference tensorrt engine, or is this some issue with the preprocessing and post processing steps.

AastaLLL · April 28, 2025, 6:57am

Hi,

It’s recommended to test your model with trtexec binary directly.

/usr/src/tensorrt/bin/trtexec --onnx=[model]

It can give it some idea about how much it takes for only the inference part.
It also supports difference precision like fp16 (--fp16) and int8 (--int8).

Then if the inference time is acceptable, you can try to run the whole pipeline with Deesptream SDK.
The library optimizes the end to end pipeline on Jetson and you can find some Python sample below:

Thanks.

araman3 · May 11, 2025, 3:25pm

Hi,
Thank you, I used this. I was wondering how exactly to deploy my custom tensorrt-created inference engine after it is made. I already made it. I am just unsure how to incorporate it into my inference pipeline. Looking at the examples, there is no simple way to use the engine. Online I saw I need to deserialize it with other steps.

DeepStream is great for future purposes, but right now it says it does not work for segmentation tasks, which I am doing. I am less looking for examples, I am moreso looking for a tutorial on end-to-end creation and usage of a custom inference engine with tensorrt. The TensorRT GitHub page seems to only have info on creating the engine, not on deploying it.

A preliminary script I created from scratch to incorporate the engine is giving me very poor inference masks and performance compared to my desktop computer results for the model. I have not reduced the precision much at all and the model is reasonably lightweight

Topic		Replies	Views
How to install torch_tensorrt Jetson Orin Nano tensorrt	3	295	August 20, 2025
Own trained nn with weights Jetson Nano pytorch	6	1059	October 18, 2021
Best way to deploy object detection model in jetson orin nano Jetson Orin Nano jetson-inference	2	1706	March 4, 2024
How to inference AI model that are designed and trained outside onto the jetson nano baord? Jetson Nano tensorrt , jetson-inference	2	322	April 19, 2024
Running custom trained models on Jetson Nano Deep Learning (Training & Inference) tensorrt , tensorflow , jetson-inference , pytorch	0	594	July 2, 2020
Jetsonnano 2gb custom model live inference? Jetson Nano jetson-inference	8	781	May 6, 2022
Train custom model on jetson orin nano using jetson-inference, deploying on jetson nano Jetson Orin Nano jetson-inference	8	1831	June 30, 2023
Want to have a sample program for object detection and semantic segmentation on jetson nano Jetson Orin Nano jetson-inference	5	1190	October 23, 2023
[Jetson Orin Nano] RuntimeError: FIND was unable to find an engine to execute this computation after trying 0 plans Jetson Orin Nano cuda , jetson	8	322	August 27, 2025
Prototyping advice (Windows) Jetson Orin Nano jetson-inference , python	3	827	November 8, 2022

Examples for Deployment of and Inference with Pretrained Custom PyTorch-Based Models on Jetson Orin Nano

Related topics