Jetson-Inference questions (detection delay and jetson_utils.videoSource reconnect)

I’m using jetson-inference and have some strange

  1. camera1 = jetson_utils.videoSource(‘rtsp://…@’, argv=[‘–input-codec=h265’,

How to detect that the stream is broken and how to reconnect? For example in case of network problems.

Maybe someone has more elegant solution than this?`

        if camera1.IsStreaming(): #lets check if camera 1 is streaming
            img1 = camera1.Capture() #lets capture image from camera 1
            add_log("Camera 1 is not streaming")
                 camera1.Close() #lets close camera 1
                add_log("Camera 1 could not be closed")
                camera1.Open() #lets open camera 1 again
                add_log("Camera 1 could not be re-opened")
        add_log("Capturing frames from camera 1 failed")
  1. It seems to me, that detection engine is behind from real image frames. It’s like image data to detection somehow is few frames behind always.



1. You can find more jetson-utils samples below.
Maybe you can add a handler when the img returns None:

2. Would you mind sharing a longer video?
It’s expected that the detection is applied per frame so it should not be a delay.


1 Like

There are 2 videos.

One of them is when cpu is almost 100% used and another is when its only about 50% used (it depends how many video feeds i ask).

There you can see how bounding boxes are some frames behind from actual video.

The script itself is:

While True:
input_img1 = camera1.Capture() #lets capture image from camera 1
add_log(“Capturing frame from cam 1 failed”)

if config.detector == 1:
    dcounter = 0 #lets reset detection counter
    detections1 = net1.Detect(input_img1, overlay="box,labels,conf") #overlay="box,labels,conf" #lets detect objects on image
    cudaMemcpy(out_img1, input_img1) #copy the image (dst, src)
    for detection1 in detections1:
        dcounter += 1 #lets add one to counter
        mvcam = 1 #camera ID 1
        dheight = int(detection1.Top)
        dright = int(detection1.Right)
        dleft = int(detection1.Left)
        dbottom = int(detection1.Bottom)
        dclassid = int(detection1.ClassID)
        dconfidence = round(detection1.Confidence, 2) #lets use only integers (no point to be to percice)
        dobjectid = int(detection1.TrackID) #lets save object ID also from jetson-inference tracking
        filename = f'{image_folder}/{mvid}-1-{did}-{dsid}-{dclassid}.jpg'
        filename_to_db = f'/data/defect/{today}/{mvid}-1-{did}-{dsid}-{dclassid}.jpg'
        for i in range(3):
            for j in range(len(detection_array)):
                if j < len(detection_array):
                    if (detection_array[j][2] == (dsid - i) and 
                    detection_array[j][1] == mvcam and 
                    detection_array[j][5] == dclassid and 
                    abs(detection_array[j][8] - dleft) <= 200 and 
                    abs(detection_array[j][9] - dheight) <= 200 and 
                    abs(detection_array[j][10] - dright) <= 200 and 
                    abs(detection_array[j][11] - dbottom) <= 200):
                        detection_array.pop(j) #lets remove similar objects from array
                    for dcls in allClasses: #loop through all classes, find longitudinal pointed stripe defect and remove dublicates
                        if dcls[0] == dclass_id and dcls[1] == "1": #if class_id is found and its longitudinal pointed stripe defect and it exists, remove it
                            detection_array.pop(j) #lets delete detection from array if there is multiple 
        detection_array.append([mvid, mvcam, dsid, did, dcounter, dclassid, dconfidence, dobjectid, dleft, dheight, dright, dbottom, 0, 0, filename_to_db, "", input_img1, filename]) #lets add detection to array

via flask app:

Define the route for streaming video

def video_feed1():
add_log(“video_feed1 is called.”)
# Return a Response object with the generator function that converts the frames to JPEG format
return Response(generate_frames(out_img1),
mimetype=‘multipart/x-mixed-replace; boundary=frame’)

and frames will be generated:

Define the function that captures video frames and converts them to JPEG format

def generate_frames(ximg):
while True:
# allocate the output, with half the size of the input
vimg1 = cudaToNumpy(ximg) #lets convert it to numpy array
vimg1 = cv2.cvtColor(vimg1, cv2.COLOR_RGB2BGR) #lets convert it to BGR to avoid seeing blue image
# Convert the numpy array to JPEG format
ret, jpeg = cv2.imencode(‘.jpg’, vimg1)
if not ret:
# Yield the JPEG data as a byte string
yield (b’–frame\r\n’
b’Content-Type: image/jpeg\r\n\r\n’ + jpeg.tobytes() + b’\r\n’)

And the right second video (in last post see only the last one), where you can see how boundingbox is “floating”

And GPU usage was about 60-80%, when viewing via jtop.


#lets define where to copy images

out_img1 = cudaAllocMapped(width=1920, height=1080, format=‘rgb8’) #lets allocate memory for output image stream 1

out_img2 = cudaAllocMapped(width=1920, height=1080, format=‘rgb8’) #lets allocate memory for output image stream 2

out_img3 = cudaAllocMapped(width=1920, height=1080, format=‘rgb8’) #lets allocate memory for output image stream 3

out_img4 = cudaAllocMapped(width=1920, height=1080, format=‘rgb8’) #lets allocate memory for output image stream 4


#lets load camera 1 and define focuser

camera1 = jetson_utils.videoSource(“csi://0”, argv=[“–input-width=1920”, “–input-height=1080”, “–input-rate=30”]) # select camera 1 - Capture a frame and return the cudaImage

focuser1 = Focuser(30)

#lets load camera 2 and define focuser

camera2 = jetson_utils.videoSource(“csi://1”, argv=[“–input-width=1920”, “–input-height=1080”, “–input-rate=30”]) # select camera 4 - Capture a frame and return the cudaImage

focuser2 = Focuser(32)

#lets load camera 3 and define focuser

camera3 = jetson_utils.videoSource(“csi://2”, argv=[“–input-width=1920”, “–input-height=1080”, “–input-rate=30”]) # select camera 2 - Capture a frame and return the cudaImage

focuser3 = Focuser(34)

#lets load camera 4 and define focuser

camera4 = jetson_utils.videoSource(“csi://3”, argv=[“–input-width=1920”, “–input-height=1080”, “–input-rate=30”]) # select camera 3 - Capture a frame and return the cudaImage

focuser4 = Focuser(35)


net1 = jetson_inference.detectNet(argv=[‘–model=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/ssd-mobilenet-v2.onnx’, ‘–labels=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’, ‘–confidence=0.3’, ‘–input-width=1920’, ‘–input-height=1080’, ‘–input-rate=30’, ‘–tracking=True’, ‘–tracker=KLT’, ‘–tracker-min-frames=1’, ‘–tracker-lost-frames=5’, ‘–tracker-overlap=0.5’, ‘–clustering=0.5’, ‘–batch_size=2’])

net2 = jetson_inference.detectNet(argv=[‘–model=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/ssd-mobilenet-v2.onnx’, ‘–labels=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’, ‘–confidence=0.3’, ‘–input-width=1920’, ‘–input-height=1080’, ‘–input-rate=30’, ‘–tracking=True’, ‘–tracker=KLT’, ‘–tracker-min-frames=1’, ‘–tracker-lost-frames=5’, ‘–tracker-overlap=0.5’, ‘–clustering=0.5’, ‘–batch_size=2’])

net3 = jetson_inference.detectNet(argv=[‘–model=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/ssd-mobilenet-v2.onnx’, ‘–labels=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’, ‘–confidence=0.3’, ‘–input-width=1920’, ‘–input-height=1080’, ‘–input-rate=30’, ‘–tracking=True’, ‘–tracker=KLT’, ‘–tracker-min-frames=1’, ‘–tracker-lost-frames=5’, ‘–tracker-overlap=0.5’, ‘–clustering=0.5’, ‘–batch_size=2’])

net4 = jetson_inference.detectNet(argv=[‘–model=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/ssd-mobilenet-v2.onnx’, ‘–labels=/home/visioline/install/jetson-inference-devit/python/training/detection/ssd/models/jw512/labels.txt’, ‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’, ‘–confidence=0.3’, ‘–input-width=1920’, ‘–input-height=1080’, ‘–input-rate=30’, ‘–tracking=True’, ‘–tracker=KLT’, ‘–tracker-min-frames=1’, ‘–tracker-lost-frames=5’, ‘–tracker-overlap=0.5’, ‘–clustering=0.5’, ‘–batch_size=2’])

maybe this information is needed…

Hi @raul.orav, it appears that you have tracking enabled, which can introduce some lag in the bounding boxes when the track is lost (it keeps lost tracks for --tracker-lost-frames=N number of frames in case the track is re-gained). I’d recommend to try disabling tracking and see if the delay goes away.

I found it - KLT tracker does this, IOT also. My “bad”, i didn’t thought that the tracker does this way.

Yes, this is it. I didn’t thought about it. My mistake. Sry.

No worries, glad that you found the culprit. I did recently add some basic documentation on the tracking in jetson-inference: I will say however that DeepStream has much better tracking algorithms available.

1 Like

With deepstream and TAO etc - the main problem is that it’s hard to find right manual or documentation. I played with some docker applications but still havent got “right touch” with it.

Actually i’d need a good training enviroment where to build TAO models and INT8 but while playing with some dockers one problem followed another :)

I think there should be something like CVAT + pytorch-ssd + TensorBoard all in one but haven’t found it :)

I spent some hours with DIGITS, but its too old …

Adding the CVAT package to the jetson-inference container could be an interesting solution, then you could run the CVAT annotation server/webpages right there from your Jetson. I will make a note of this to discuss with the CVAT team if they’ve run their server on ARM64 platforms before.

1 Like

It seems that CVAT is old, tested and still maintained, it’s not NVIDIA but in my opinion quite good product.

I think the training process should be also simpler, i imagine that after finished training and validation dataset annotating it should be few mouse clicks and you should see training process, training data, etc.

I think from nvidia point of view it could be wize to build a CVAT addon module where you can train models for deepstream. So you can build model on server/pc/nvidia graphics card and use it on jetson.

For example, if your model is larger and in dataset there are 5000+ images, event NX is a bit slow.

I think if in CVAT you could train INT8 for deepstream and also to use it with auto annotation and see some training dashboard - it could be a dream come true :)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.