Low Performance - Jetson Orin Nano Super

Hello,

We are using a Jetson Orin Nano Super to run YOLOv11n with all available optimizations enabled, including:

  • TensorRT acceleration
  • FP16 precision
  • Max Power Mode
  • Jetson Clocks and MAXN mode

Despite these settings, our system achieves only 20 FPS even when processing a video stored on an NVMe SSD with CV2. According to the Ultralytics Guide (link to guide), we should be seeing 4.9ms inference times, but our results fluctuate between 7-70ms.

Below are sample output statistics from the inference process:

Here is a sample of the output:

0: 640x640 1 note, 1 red_robot, 1 blue_robot, 13.5ms
Speed: 3.8ms preprocess, 13.5ms inference, 8.0ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 note, 1 red_robot, 1 blue_robot, 6.8ms
Speed: 3.0ms preprocess, 6.8ms inference, 9.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 note, 1 red_robot, 1 blue_robot, 14.3ms
Speed: 3.8ms preprocess, 14.3ms inference, 16.5ms postprocess per image at shape (1, 3, 640, 640)

This is our training output:

Running on device: cuda
Ultralytics 8.3.58 🚀 Python-3.10.12 torch-2.5.0a0+872d972e41.nv24.08 CUDA:0 (Orin, 7620MiB)
engine/trainer: task=detect, mode=train, model=yolo11n.pt, data=vision_tracking/scripts/dataset.yaml, epochs=75, time=None, patience=3, batch=4, imgsz=640, save=True, save_period=-1, cache=False, device=cuda, workers=4, project=vision_tracking/runs, name=train, exist_ok=True, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=True, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=vision_tracking/runs/train
Overriding model.yaml nc=80 with nc=3

               from  n    params  module                                       arguments                     

0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
2 -1 1 6640 ultralytics.nn.modules.block.C3k2 [32, 64, 1, False, 0.25]
3 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
4 -1 1 26080 ultralytics.nn.modules.block.C3k2 [64, 128, 1, False, 0.25]
5 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
6 -1 1 87040 ultralytics.nn.modules.block.C3k2 [128, 128, 1, True]
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
8 -1 1 346112 ultralytics.nn.modules.block.C3k2 [256, 256, 1, True]
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
10 -1 1 249728 ultralytics.nn.modules.block.C2PSA [256, 256, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, ‘nearest’]
12 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
13 -1 1 111296 ultralytics.nn.modules.block.C3k2 [384, 128, 1, False]
14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, ‘nearest’]
15 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 1 32096 ultralytics.nn.modules.block.C3k2 [256, 64, 1, False]
17 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
18 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1]
19 -1 1 86720 ultralytics.nn.modules.block.C3k2 [192, 128, 1, False]
20 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
21 [-1, 10] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 -1 1 378880 ultralytics.nn.modules.block.C3k2 [384, 256, 1, True]
23 [16, 19, 22] 1 431257 ultralytics.nn.modules.head.Detect [3, [64, 128, 256]]
YOLO11n summary: 319 layers, 2,590,425 parameters, 2,590,409 gradients, 6.4 GFLOPs

Is there anything we can optimize further?

We would appreciate guidance,
Team RamFerno 3756

1 Like

Hi @3756ramferno ,
I would recommend you using the JEtson forum to raise the query.

Thanks

Hey @AakankshaS!

The Jetson Forums haven’t been working for weeks. Is there another place that would be better suited for this question?

Hi @3756ramferno,

There are no issues with the forums, there have been hundreds of topics posted in the past weeks. There must be an issue on your end. What browser and OS are you running?