Examples for Deployment of and Inference with Pretrained Custom PyTorch-Based Models on Jetson Orin Nano

Hello,
I am a beginner to embedded programming and systems like Jetson but need to deploy my custom built and trained Deep learning segmentation model on a Jetson Nano Orin, for integration with a custom imaging platform.

I have written and trained my own pseudo-segmentation model using PyTorch and created a weights file, and have a general inference script for sending test images to the trained model and getting output masks and evaluation metrics. I am now trying to deploy the model on my Jetson Orin Nano (Jetpack 6.2, Ubuntu 22.04) but am having trouble finding good examples or tutorials on how to convert and optimize my weights (in .pth format currently) and files (all preprocessing, model architecture and inference scripts are .py at the moment). Can someone point me to good examples or workflows of deploying a PyTorch model on the new 6.2 Jetpack Jetson Orin Nano ?

Hi,

PyTorch is supported on the Jetson.
So you can try to deploy it first. You can find the package info on the below wiki page:

Next, you can try to convert the model into TensorRT for acceleration.
Some examples can be found below:

A CLI tool to convert the model is also included in the JetPack 6.2 (need the onnx model).

$ /usr/src/tensorrt/bin/trtexec --onnx=[model]

Thanks.

Hi ,
Thank you for the info and thank you for verifying Pytorch compatibility. But if i just deploy the inference script from my CLI, how will the jetson know to use the GPU and not the CPU? How can I ensure that the GPU is being used, and not the CPU?

Or is it really as simple as uploading my script, architecture, and weights to the Jetson Orin Nano, then running my inference script from the CLI and it will automatically know to use the GPU for inference without me doing anything extra?

Hi,

You can control the layer placement with PyTorch API.
For example, in model = NeuralNetwork().to(device).
The device variable can be cuda or cpu.

We also have a system monitor for you to check the GPU utilization.

$ sudo -H pip3 install -U jetson-stats
$ jtop

Thanks.

Hi,
I have run the model and specified cuda as the device. I also used the jtop command to monitor GPU usage and it says it is working. I have noticed 2 big issues:

  1. Inference is still running exceedingly slowly. I am trying to run inference with my trained weights on a dataset of about 1600 images (tif file format). It estimates to take almost 10 minutes to finish running the inference task.
  2. I keep getting a warning “System throttled due to overcurrent”. I stopped my run early because the Jetson became very hot around the time of this warning. I havent converted to tensorrt format yet, but the overheating doesnt seem good regardless.

What am i supposed to do to increase performance and not have the system overheat? Could you please assist with this?

To provide more info, i am using the provided cord with the Jetson Orin Nano Dev Kit for power, and also starting in MAXN SUPER mode. I am plugged into an extension cord plugged into standard wall power. I dont know if this matters.

Hi,

Could you share the GPU utilization percentage with us?
And the output from tegrastats:

$ sudo tegrastats

Throttling is a mechanism to protect your system by reducing the processor’s clock.
So it’s okay to see it and you can turn the warning off with the suggestion below:

Thanks.

I used jtop to see GPU usage during the inference process and saw that it would usually say nearly 99% GPU, with occasional drops to much lower numbers like .3% or 22%. As the inference process runs for longer, I notice that GPU utilization strays from near 100% usage more often

tegrastats returns the following output every second with slight changes to values:
04-22-2025 15:29:13 RAM 5193/7620MB (lfb 1x4MB) SWAP 2104/3810MB (cached 0MB) CPU [13%@1728,11%@1728,10%@1728,12%@1728,9%@1344,61%@1344] EMC_FREQ 25%@3199 GR3D_FREQ 0%@[1014] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@54.437C soc2@53.281C soc0@53.281C gpu@56.093C tj@56.093C soc1@53.375C VDD_IN 13395mW/12105mW VDD_CPU_GPU_CV 5582mW/4402mW VDD_SOC 3284mW/3444mW

Hi,

Based on the log, the GPU utilization is 0%. GR3D_FREQ 0%@[1014].
Are you able to share the log with a longer period so we can know more about the behavior.

If you are seeing the GPU utilization drop to zero sometimes.
It indicates that the GPU need to wait for the input data (idle) and still have zoom for acceleration.

Thanks.

Hi,
My apologies I should have posted more:

04-23-2025 13:19:15 RAM 6013/7620MB (lfb 2x1MB) SWAP 40/3810MB (cached 0MB) CPU [9%@1728,15%@1728,22%@1728,7%@1728,0%@1036,35%@1036] EMC_FREQ 22%@3199 GR3D_FREQ 99%@[1016] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@50.687C soc2@49.187C soc0@48.906C gpu@51.843C tj@51.843C soc1@49.187C VDD_IN 11099mW/15378mW VDD_CPU_GPU_CV 3883mW/6021mW VDD_SOC 3025mW/4422mW
04-23-2025 13:19:16 RAM 6020/7620MB (lfb 4x512kB) SWAP 40/3810MB (cached 0MB) CPU [13%@729,12%@729,27%@729,31%@729,4%@729,10%@729] EMC_FREQ 37%@3199 GR3D_FREQ 99%@[1008] NVDEC off NVJPG off NVJPG1 off VIC off OFA off APE 200 cpu@51.125C soc2@49.812C soc0@49.343C gpu@53C tj@53C soc1@49.218C VDD_IN 16876mW/15477mW VDD_CPU_GPU_CV 6681mW/6065mW VDD_SOC 4833mW/4449mW

It does say 99%, but as you saw, there are moments of 0% or very low GPU usage even after data has been loaded and only inference is running. Are you saying it is an issue of data processing prior to input to the model?

Hi,

For example, if you get the GPU utilization as 0%, 99%, 0%, 99%, …
This might indicate the GPU needs to wait (idle) for the input to process.
(the app is IO bound instead of computational bound)

So to improve such a pipeline, you can try to move the preprocessing (bottleneck) step to a more powerful hardware (ex. GPU or VIC) to see if it helps.

Thanks.