Yolo11 Object Detection - How to Input Live Video from CAM0?

I am running Yolo11 object detection on my Orin Nano Super dev-kit with JetPack 6.2 implemented on an SSD drive. My system information is here: system_info.txt (1.0 KB)

Yolo11 runs with python3 without issues when the input is an image or an mp4 video. For example, this works just fine and is blazing fast:

from ultralytics import YOLO

model = YOLO("yolo11n.engine")

results = model.predict("pedestrians.mp4", conf=0.25, show=True)

I’m having trouble coding up the case when the input is live video from a CSI camera (cam0) on the Orin dev-kit. What is the code for this in python? Is there a C/C++ version for this?

You can pass frames to model.predict() in Ultralytics for prediction:

So as long as you have the code to get the frames from the camera, you can run the inference on it.

You can also run inference on webcam with Ultralytics directly.

But not sure if it works for CSI camera. You can try with source=0.

Hello @xplanescientist,

This example on python will help you capture images from a CSI camera, passing them through the NVIDIA Jetson ISP for debayering and other corrections, and finally into python where you can convert them to CV mats or numpy arrays for you to feed your AI model with the instructions that @Y-T-G provided.

Hope this helps.

Please let us know if you require further support, we would love to help.

import gi
import sys
import signal

gi.require_version('Gst', '1.0')
gi.require_version('GstApp', '1.0')
gi.require_version('GObject', '2.0')
from gi.repository import Gst, GObject, GLib

Gst.init(None)

# Graceful exit using GMainLoop
loop = GLib.MainLoop()

def signal_handler(sig, frame):
    print("Ctrl+C detected. Exiting...")
    loop.quit()

signal.signal(signal.SIGINT, signal_handler)

def on_new_sample(sink, data):
    sample = sink.emit("pull-sample")
    if sample:
        buf = sample.get_buffer()
        caps = sample.get_caps()
        width = caps.get_structure(0).get_value('width')
        height = caps.get_structure(0).get_value('height')
        print(f"Received frame of resolution: {width}x{height}")

        # Map buffer and get data
        result, mapinfo = buf.map(Gst.MapFlags.READ)
        if result:
            frame_data = mapinfo.data  # bytes
            # You could now convert this to a numpy array if needed
            # e.g., using numpy.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
            buf.unmap(mapinfo)
        return Gst.FlowReturn.OK
    return Gst.FlowReturn.ERROR

def main():
    pipeline_str = (
        "nvarguscamerasrc ! "
        "nvvidconv ! "
        "appsink name=appsink emit-signals=true max-buffers=1 drop=true"
    )

    pipeline = Gst.parse_launch(pipeline_str)
    appsink = pipeline.get_by_name("appsink")
    appsink.connect("new-sample", on_new_sample, None)

    # Start pipeline
    ret = pipeline.set_state(Gst.State.PLAYING)
    if ret == Gst.StateChangeReturn.FAILURE:
        print("Unable to set the pipeline to the playing state.", file=sys.stderr)
        sys.exit(1)

    print("Running... Press Ctrl+C to stop.")
    try:
        loop.run()
    finally:
        pipeline.set_state(Gst.State.NULL)
        print("Pipeline stopped.")

if __name__ == '__main__':
    main()

best regards,
Andrew
Embedded Software Engineer at ProventusNova

@proventusnova, I first had to enable the CSI camera with the “/opt/nvidia/jetson-io/jetson-io.py” script. I selected “CSI Camera IMX219 Dual”, saved the configuration, and rebooted.

I verified the video streaming with the “nvgstcapture-1.0” command. All good.

Then I tried your python script, and all it did was print the following. I hit cntrl-C because it kept repeating the same “Received frame…” line. See below.

Running... Press Ctrl+C to stop.
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3280 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 3280 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 2 
   Output Stream W = 1920 H = 1080 
   seconds to Run    = 0 
   Frame Rate = 29.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
Received frame of resolution: 1920x1080
^CCtrl+C detected. Exiting...
Received frame of resolution: 1920x1080
GST_ARGUS: Cleaning up
CONSUMER: Done Success
GST_ARGUS: Done Success
Pipeline stopped.

.

So I think it’s capturing the frames. What commands can I add to the python script to make it stream the frames in a window?

Thank you.

Hello @xplanescientist,

Great to hear you got the script running after enabling the cameras by applying a device tree overlay with NVIDIA’s jetsion-io tool.

Now, for the next steps you will need to focus on this function from the script I shared:

def on_new_sample(sink, data):
    sample = sink.emit("pull-sample")
    if sample:
        buf = sample.get_buffer()
        caps = sample.get_caps()
        width = caps.get_structure(0).get_value('width')
        height = caps.get_structure(0).get_value('height')
        print(f"Received frame of resolution: {width}x{height}")

        # Map buffer and get data
        result, mapinfo = buf.map(Gst.MapFlags.READ)
        if result:
            frame_data = mapinfo.data  # bytes
            # You could now convert this to a numpy array if needed
            # e.g., using numpy.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
            buf.unmap(mapinfo)
        return Gst.FlowReturn.OK
    return Gst.FlowReturn.ERROR

This line:

print(f"Received frame of resolution: {width}x{height}")

Is the one printing all those messages you mentioned, we would suggest you comment it out to stop the messages showing:

#print(f"Received frame of resolution: {width}x{height}")

Then, you will need to focus on this particular section of the code:

        if result:
            frame_data = mapinfo.data  # bytes
            # You could now convert this to a numpy array if needed
            # e.g., using numpy.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
            buf.unmap(mapinfo)

As noted by the comments, that is where you will want to add the logic for manipulating the buffers you are capturing from the camera.

For instance, if what you are looking for is to see the buffers on a Window, you can use OpenCV and change that section of the code to:

if result:
    frame_data = mapinfo.data  # bytes
    
    # Convert to numpy array (RGB format)
    frame_array = np.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))

    # Optional: Convert RGB to BGR if you want correct colors in OpenCV
    frame_bgr = cv2.cvtColor(frame_array, cv2.COLOR_RGB2BGR)

    # Show in OpenCV window
    cv2.imshow("Camera", frame_bgr)
    cv2.waitKey(1)

    buf.unmap(mapinfo)

Please apply the changes and let me know how it goes.

best regards,
Andrew
Embedded Software Engineer at ProventusNova

Tried your code updates, but encountered at least one error. Full code here:

import gi
import sys
import signal
import numpy as np 


gi.require_version('Gst', '1.0')
gi.require_version('GstApp', '1.0')
gi.require_version('GObject', '2.0')
from gi.repository import Gst, GObject, GLib
                                                                                                                                                                                                                                                                                                                                                                                                                    
Gst.init(None)

# Graceful exit using GMainLoop
loop = GLib.MainLoop()



def signal_handler(sig, frame):
    print("Ctrl+C detected. Exiting...")
    loop.quit()

signal.signal(signal.SIGINT, signal_handler)



def on_new_sample(sink, data):
    sample = sink.emit("pull-sample")
    if sample:
        buf    = sample.get_buffer()
        caps   = sample.get_caps()
        width  = caps.get_structure(0).get_value('width')
        height = caps.get_structure(0).get_value('height')

        ###print("Received frame of resolution: {width}x{height}")

        # Map buffer and get data
        result, mapinfo = buf.map(Gst.MapFlags.READ)

        if result:
            frame_data = mapinfo.data  # bytes
            # You could now convert this to a numpy array if needed
            # e.g., using numpy.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))

            # Convert to numpy array (RGB format)
            frame_array = np.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))

            # Optional: Convert RGB to BGR if you want correct colors in OpenCV
            frame_bgr = cv2.cvtColor(frame_array, cv2.COLOR_RGB2BGR)

            # Show in OpenCV window
            cv2.imshow("Camera", frame_bgr)
            cv2.waitKey(1)

            buf.unmap(mapinfo)
        return Gst.FlowReturn.OK

    return Gst.FlowReturn.ERROR



def main():
    pipeline_str = (
        "nvarguscamerasrc ! "
        "nvvidconv ! "
        "appsink name=appsink emit-signals=true max-buffers=1 drop=true"
    )

    pipeline = Gst.parse_launch(pipeline_str)
    appsink = pipeline.get_by_name("appsink")
    appsink.connect("new-sample", on_new_sample, None)

    # Start pipeline
    ret = pipeline.set_state(Gst.State.PLAYING)
    if ret == Gst.StateChangeReturn.FAILURE:
        print("Unable to set the pipeline to the playing state.", file=sys.stderr)
        sys.exit(1)

    print("Running... Press Ctrl+C to stop.")
    try:
        loop.run()
    finally:
        pipeline.set_state(Gst.State.NULL)
        print("Pipeline stopped.")


if __name__ == '__main__':
    main()

But received the following error. Has to do with an array size mismatch?

et@sky:~/robotics/yolo/example01$ python3 run_cam_v02.py 
Running... Press Ctrl+C to stop.
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3280 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 3280 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;
GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 2 
   Output Stream W = 1920 H = 1080 
   seconds to Run    = 0 
   Frame Rate = 29.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
Traceback (most recent call last):
  File "/home/jet/robotics/yolo/example01/run_cam_v02.py", line 46, in on_new_sample
    frame_array = np.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
ValueError: cannot reshape array of size 64 into shape (1080,1920,3)
Traceback (most recent call last):
  File "/home/jet/robotics/yolo/example01/run_cam_v02.py", line 46, in on_new_sample
    frame_array = np.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
ValueError: cannot reshape array of size 64 into shape (1080,1920,3)
Traceback (most recent call last):
  File "/home/jet/robotics/yolo/example01/run_cam_v02.py", line 46, in on_new_sample
    frame_array = np.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
ValueError: cannot reshape array of size 64 into shape (1080,1920,3)
Traceback (most recent call last):
  File "/home/jet/robotics/yolo/example01/run_cam_v02.py", line 46, in on_new_sample
    frame_array = np.frombuffer(frame_data, dtype=np.uint8).reshape((height, width, 3))
ValueError: cannot reshape array of size 64 into shape (1080,1920,3)

The approach from @proventusnova did not work as I described in the previous thread. This fix is probably simple, but I could not figure it out.

I tried a much simpler and straightforward approach using jetson-utils that works nicely. Code as follows:

import sys
from   jetson_utils import videoSource, videoOutput, Log

input  = videoSource()
output = videoOutput()
 
# process frames until EOS or the user exits
while True:
    # capture the next image
    img = input.Capture()

    if img is None: # timeout
        continue  
        
    # render the image
    output.Render(img)

    # exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

The next step is to pass the “img” frame variable to the Ultralyticsmodel.predict() function as @Y-T-G suggested. However I suspect the ‘jetson_utils’ frame variable is not compatible with the Ultralytics expected input type. Will let you know what happens.

The jetson-utils library worked great for feeding live CAM0 video frames. See above thread. Also have a working python program to run Yolo11 object detection from an mp4 video file.

Then I tried running object detection with live camera videostream. I created a new program
to feed jetson-utils video frames to the Yolo model “function”. See code below:

import cv2
import numpy as np
from ultralytics import YOLO 

import sys
from   jetson_utils import videoSource, videoOutput, Log

# Load YOLO TRT model
model = YOLO("/home/jet/robotics/yolo/example01/yolo11n.engine")

# Jetson_Utils initialize
input  = videoSource()
output = videoOutput()
 
# process frames until EOS or the user exits
while True:
    # capture the next image
    frame = input.Capture()

    if frame is None: # timeout
        continue  
        
    # render the image
    #output.Render(frame)

    # exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

    # Perform Yolo object detection on the frame
    results = model(frame)

    for resx in results:
        boxes     = resx.boxes  # Boxes object for bounding box outputs
        masks     = resx.masks  # Masks object for segmentation masks outputs
        keypoints = resx.keypoints  # Keypoints object for pose outputs
        probs     = resx.probs  # Probs object for classification outputs
        obb       = resx.obb  # Oriented boxes object for OBB outputs

        print(boxes)

.
.
However, the program failed - see results below. The Yolo “model” function expects a frame input, but does not recognize the “frame” result coming from the jetson_utils “input.Capture()” function. How can I reconcile this?

jet@sky:~/robotics/yolo/example01$ python3 detect_v04_camera.py 
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'.
[gstreamer] initialized gstreamer, version 1.20.3.0
[gstreamer] gstCamera -- attempting to create device csi://0
[gstreamer] gstCamera pipeline string:
[gstreamer] nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
[gstreamer] gstCamera successfully created device csi://0
[video]  created gstCamera from csi://0
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: csi://0
     - protocol:  csi
     - location:  0
  -- deviceType: csi
  -- ioType:     input
  -- width:      1280
  -- height:     720
  -- frameRate:  30
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: rotate-180
------------------------------------------------
[OpenGL] glDisplay -- X screen 0 resolution:  3840x2160
[OpenGL] glDisplay -- X window resolution:    3840x2160
[OpenGL] glDisplay -- display device initialized (3840x2160)
[video]  created glDisplay from display://0
------------------------------------------------
glDisplay video options:
------------------------------------------------
  -- URI: display://0
     - protocol:  display
     - location:  0
  -- deviceType: display
  -- ioType:     output
  -- width:      3840
  -- height:     2160
  -- frameRate:  0
  -- numBuffers: 4
  -- zeroCopy:   true
------------------------------------------------
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> nvarguscamerasrc0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvarguscamerasrc0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvarguscamerasrc0
[gstreamer] gstreamer message stream-start ==> pipeline0
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3280 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 3280 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 4 
   Output Stream W = 1280 H = 720 
   seconds to Run    = 0 
   Frame Rate = 59.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstBufferManager -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps:  video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager -- recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407
[cuda]   allocated 4 ring buffers (1382407 bytes each, 5529628 bytes total)
[cuda]   allocated 4 ring buffers (8 bytes each, 32 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline0
[gstreamer] gstreamer message latency ==> mysink
[gstreamer] gstreamer message warning ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
[cuda]   allocated 4 ring buffers (2764800 bytes each, 11059200 bytes total)
Loading /home/jet/robotics/yolo/example01/yolo11n.engine for TensorRT inference...
[07/10/2025-00:43:30] [TRT] [I] Loaded engine size: 5 MiB
[07/10/2025-00:43:30] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[07/10/2025-00:43:31] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +12, now: CPU 0, GPU 14 (MiB)

Traceback (most recent call last):
  File "/home/jet/robotics/yolo/example01/detect_v04_camera.py", line 40, in <module>
    results = model(frame)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 181, in __call__
    return self.predict(source, stream, **kwargs)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 559, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 175, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
  File "/home/jet/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
    response = gen.send(None)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 233, in stream_inference
    self.setup_source(source if source is not None else self.args.source)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 205, in setup_source
    self.dataset = load_inference_source(
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/data/build.py", line 199, in load_inference_source
    source, stream, screenshot, from_img, in_memory, tensor = check_source(source)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/data/build.py", line 181, in check_source
    raise TypeError("Unsupported image type. For supported types see https://docs.ultralytics.com/modes/predict")
TypeError: Unsupported image type. For supported types see https://docs.ultralytics.com/modes/predict
[gstreamer] gstCamera -- stopping pipeline, transitioning to GST_STATE_NULL
GST_ARGUS: Cleaning up
CONSUMER: Done Success
GST_ARGUS: Done Success
[gstreamer] gstCamera -- pipeline stopped```

np_image = np.array(frame)

Should help, to change the object type from cudaImage type to a numpy array; The function cuda2Numpy in jetson_utils was doing the job but is broken right now.

Good news. I now have a working bare-bones Yolo11 Object Detection python code with video input. It’s very concise and clean but very powerful. Deployed it on my Orin Nano devkit with JetPack 6.2.1.

The cuda2Numpy function did the job! The jetson-inference utilities was installed when I tried building the Hello AI World project from source. So it was there all along.

Working python code shown below. But more features needed.

import numpy as np
from ultralytics import YOLO 

import sys
from jetson_utils import videoSource, videoOutput, Log
from jetson_utils import cudaToNumpy

# Load YOLO TRT model
model = YOLO("/home/jet/robotics/yolo/example01/yolo11n.engine")

# Jetson_Utils initialize
input  = videoSource()
output = videoOutput()
 
# Run Inference on Video Frames
while True:

    # capture the next image
    frame = input.Capture()

    if frame is None: # timeout
        continue  
        
    # Render the image
    output.Render(frame)

    # Exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

    # Convert Jetson Cuda image to Numpy array  
    frame_numpy = cudaToNumpy(frame)

    # Run Yolo Inference
    results = model(frame_numpy)
 
    for resx in results:
        boxes     = resx.boxes      # Boxes object for bounding box outputs
        masks     = resx.masks      # Masks object for segmentation masks outputs
        keypoints = resx.keypoints  # Keypoints object for pose outputs
        probs     = resx.probs      # Probs object for classification outputs
        obb       = resx.obb        # Oriented boxes object for OBB outputs

Next feature needed is overlaying the Yolo bounding boxes onto the jetson_utils output window. The output window is currently showing the raw video frames, and it’s very fast! But I need those bounding boxes. Any ideas?

Hi,

There is a DrawRect function in the jetson-utils.
Could you check if this can meet your requirements?

Thanks.

Good news. Yolo11 object detection python code is working on Orin Nano Super devkit with JetPack 6.2.1 (System Info). While code is only half a page long, it streams object detection results and draws bounding boxes in output window extremely fast. Inference time is ~9ms, and the output streaming is smooth and swift.

Code makes use of Ultralytics’ Yolo library for inference (installation here
), and jetson-utils for video input and output. That’s it. The hard part was figuring out the frame data type conversions between Yolo and jetson-utils. A big thanks to @Toxite.

from ultralytics import YOLO 

rom   jetson_utils import videoSource, videoOutput, Log
from   jetson_utils import cudaToNumpy
from   jetson_utils import cudaFromNumpy

# Load YOLO TRT model
model = YOLO("/home/jet/robotics/yolo/networks/yolo11n.engine")

# Jetson_Utils initialize
input  = videoSource()
output = videoOutput()
 
# Run Inference on Video Frames
while True:

    # capture the next image
    frame = input.Capture()

    if frame is None: # timeout
        continue  
        
    # Exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

    # Convert Jetson Cuda image to Numpy array  
    frame_numpy = cudaToNumpy(frame)

    # Run Yolo Inference
    results = model(frame_numpy)  #, show=True)
 
    for resx in results:
        boxes     = resx.boxes      # Boxes object for bounding box outputs
        masks     = resx.masks      # Masks object for segmentation masks outputs
        keypoints = resx.keypoints  # Keypoints object for pose outputs
        probs     = resx.probs      # Probs object for classification outputs
        obb       = resx.obb        # Oriented boxes object for OBB outputs

        # Display image and bounding box in Jetson_Utils output window
        output.Render(cudaFromNumpy(resx.plot()))

        print(boxes)   # Stream object detection results

Tip: the Ultralytics forum is very helpful for Yolo questions. I encourge it strongly.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.