Yolov8 model latency on jetson orin nx

When I run the yolov9 model on my Jetson orin nx computer, the delay is at a reasonable level. However, when I run models such as yolov8 and yolov10, the delay is up to 1 second longer. What could be the reason for this?

I am testing the models in real time with the raspberry pi hq camera connected to my jetson computer. Here I am providing the camera input with opencv and gstreamer because my camera is a csi camera.

The issue may be due to the difference in script-based execution between YOLOv9 and the other models, as well as running them through the Ultralytics library.

How can I solve my problem in this case?

information about the system;

  • GStreamer version: 1.16.3
  • Ultralytics version: 8.0.113
  • JetPack version: R35 (release), REVISION: 3.1

Hi,

Do you know which backend you use for YOLOv9, YOLOv8 and YOLOv10?
If all are using the TensorRT backend, you should be able to get a similar performance with the model having a similar weight size.

Please also monitor your device with tegrastats to see if all the GPU resources are occupied.
We expect TensorRT can use ~99% GPU when inferencing.

$ sudo tegrastats

Thanks.

I did my tests with pytorch as backend. I used the large weight files of the models for all 3 yolo models. So I don’t think the problem is that the model is causing difficulties in jetson because I have this problem especially in models like yolov8 and yolov10. The problem here is not caused by fps. When I display the processed image in jetson with the imshow() function, I see the image almost without delay in yolov9, but I observe a delay of about 2 seconds in models like yolov8 and yolov10. I wonder if anyone has this problem like me and how I can solve it.

What’s the code you used with Ultralytics?

from ultralytics import YOLO
import cv2

model = YOLO(‘yolov8l.pt’)

def gstreamer_pipeline(
sensor_id=0,
capture_width=1920,
capture_height=1080,
display_width=1920,
display_height=1080,
framerate=30,
flip_method=0,
):
return (
"nvarguscamerasrc sensor-id=%d ! "
"video/x-raw(memory:NVMM), width=(int)%d, height=(int)%d, framerate=(fraction)%d/1 ! "
"nvvidconv flip-method=%d ! "
"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
"videoconvert ! "
“video/x-raw, format=(string)BGR ! appsink”
% (
sensor_id,
capture_width,
capture_height,
framerate,
flip_method,
display_width,
display_height,
)
)

cap = cv2.VideoCapture(gstreamer_pipeline, cv2.CAP_GSTREAMER)
if not cap.isOpened():
print(“GStreamer pipeline açılmadı”)
exit()

while True:
ret, frame = cap.read()
if not ret:
break

# inference
results = model.predict(frame, device=0, imgsz=640, half=True, conf=0.25, verbose=False)  


annotated_frame = results[0].plot()


cv2.imshow("YOLOv11 Ultralyitcs", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

cap.release()
cv2.destroyAllWindows()

I did my tests with this code for yolov8 and yolov10

Hi,

Do you use the same source for the YOLOv9?

We have tried the YOLOv11 with Ultralyitcs CLI and didn’t see the latency in the display as you mentioned.
To debug further, could you further check the latency comes from results[0].plot() or cv2.imshow?

Thanks.

I am using the same gstreamer pipeline as a source in yolov9. Unlike Yolov9, I am running the software in detect.py, which they provide in their own source on github. Can you give me some advice on how to find out which one is causing the delay, results[0].plot() or cv2.imshow?

Thanks.

What if you display without running predict?

Thanks for your suggestion. I’ll test it and get back to you.

Is this still an issue to support? Any result can be shared?

Sorry for the delay. I haven’t been able to test for a while. I haven’t solved the problem yet. I’ll get back to you as soon as possible.

I did the test and the delay is caused by results[0].plot(). There is a 2 second delay. Can you give me a roadmap on how to solve this problem?

I can give you some information. While I don’t have any problems with versions like yolov5 and yolov9, I have this problem with versions like yolov8 and yolov10. As far as I know, there are differences in software architecture between these versions. Could my problem be related to this?

Hi,

Sorry for the late.

Suppose you run yolov8 and yolov5 in the same JetPack version.
Based on this, the CUDA, cuDNN, and TensorRT versions are identical.

So the difference is the Ultralyitcs source you used.
Since it’s a third-party library, it’s recommended to check with the Ultralyitcs team to get more information.

Thanks.

I had the same problem with the latest ultralytics version. I found a clue to solve the problem. We can give image input with dataloaders.py in yolov9. Here, with some work with videocapture, it provides synchronization between the camera image input and yolo’s fps. I wonder if this is not available in ultralystics? The dataloaders.py code I used for yolov9 is as follows.

def gstreamer_pipeline(self,
sensor_id=0,
capture_width=1600 ,
capture_height=900 ,
display_width=1600 ,
display_height=900 ,
framerate=30,
flip_method=2,
):
return (
"nvarguscamerasrc sensor-id=%d ! "
"video/x-raw(memory:NVMM), width=(int)%d, height=(int)%d, framerate=(fraction)%d/1 ! "
"nvvidconv flip-method=%d ! "
"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
"videoconvert ! "
“video/x-raw, format=(string)BGR ! appsink”
% (
sensor_id,
capture_width,
capture_height,
framerate,
flip_method,
display_width,
display_height,
)
)

YOLOv5 streamloader, i.e. python detect.py --source 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP streams

def init(self, sources=‘streams.txt’, img_size=640, stride=32, auto=True, transforms=None, vid_stride=1):
torch.backends.cudnn.benchmark = True # faster for fixed-size inference
self.mode = ‘stream’
self.img_size = img_size
self.stride = stride
self.vid_stride = vid_stride # video frame-rate stride
sources = Path(sources).read_text().rsplit() if os.path.isfile(sources) else [sources]
n = len(sources)
self.sources = [clean_str(x) for x in sources] # clean source names for later
self.imgs, self.fps, self.frames, self.threads = [None] * n, [0] * n, [0] * n, [None] * n
for i, s in enumerate(sources): # index, source
# Start thread to read frames from video stream
st = f’{i + 1}/{n}: {s}… ’
if urlparse(s).hostname in (‘www.youtube.com’, ‘youtube.com’, ‘youtu.be’): # if source is YouTube video
# YouTube format i.e. ‘https://www.youtube.com/watch?v=Zgi9g1ksQHc’ or ‘https://youtu.be/Zgi9g1ksQHc
check_requirements((‘pafy’, ‘youtube_dl==2020.12.2’))
import pafy
s = pafy.new(s).getbest(preftype=“mp4”).url # YouTube URL
s = eval(s) if s.isnumeric() else s # i.e. s = ‘0’ local webcam
if s == 0:
assert not is_colab(), ‘–source 0 webcam unsupported on Colab. Rerun command in a local environment.’
assert not is_kaggle(), ‘–source 0 webcam unsupported on Kaggle. Rerun command in a local environment.’
#cap = cv2.VideoCapture(s)
cap = cv2.VideoCapture(self.gstreamer_pipeline(),cv2.CAP_GSTREAMER)
assert cap.isOpened(), f’{st}Failed to open {s}’
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS) # warning: may return 0 or nan
self.frames[i] = max(int(cap.get(cv2.CAP_PROP_FRAME_COUNT)), 0) or float(‘inf’) # infinite stream fallback
self.fps[i] = max((fps if math.isfinite(fps) else 0) % 100, 0) or 30 # 30 FPS fallback

    _, self.imgs[i] = cap.read()  # guarantee first frame
    self.threads[i] = Thread(target=self.update, args=([i, cap, s]), daemon=True)
    LOGGER.info(f"{st} Success ({self.frames[i]} frames {w}x{h} at {self.fps[i]:.2f} FPS)")
    self.threads[i].start()
LOGGER.info('')  # newline

# check for common shapes
s = np.stack([letterbox(x, img_size, stride=stride, auto=auto)[0].shape for x in self.imgs])
self.rect = np.unique(s, axis=0).shape[0] == 1  # rect inference if all shapes equal
self.auto = auto and self.rect
self.transforms = transforms  # optional
if not self.rect:
    LOGGER.warning('WARNING ⚠️ Stream shapes differ. For optimal performance supply similarly-shaped streams.')

def update(self, i, cap, stream):
# Read stream i frames in daemon thread
n, f = 0, self.frames[i] # frame number, frame array
while cap.isOpened() and n < f:
n += 1
cap.grab() # .read() = .grab() followed by .retrieve()
if n % self.vid_stride == 0:
success, im = cap.retrieve()
if success:
self.imgs[i] = im
else:
LOGGER.warning(‘WARNING ⚠️ Video stream unresponsive, please check your IP camera connection.’)
self.imgs[i] = np.zeros_like(self.imgs[i])
cap.open(stream) # re-open stream if signal was lost
time.sleep(0.0) # wait time

def iter(self):
self.count = -1
return self

def next(self):
self.count += 1
if not all(x.is_alive() for x in self.threads) or cv2.waitKey(1) == ord(‘q’): # q to quit
cv2.destroyAllWindows()
raise StopIteration

im0 = self.imgs.copy()
if self.transforms:
    im = np.stack([self.transforms(x) for x in im0])  # transforms
else:
    im = np.stack([letterbox(x, self.img_size, stride=self.stride, auto=self.auto)[0] for x in im0])  # resize
    im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW
    im = np.ascontiguousarray(im)  # contiguous

return self.sources, im, im0, None, ''

def len(self):
return len(self.sources) # 1E12 frames = 32 streams at 30 FPS for 30 years

Hi,

It looks like this is a question specific to the Ultralytics source.
Have you checked with the team?

Thanks.

Yes, I contacted them and their suggestions worked. If anyone else has experienced this problem, we need to add the ! appsink max-buffers=1 drop=True statement to the end of the gstreamer pipeline.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.