Digital Zoom & Pan Using Jetson-Utils

Hi all,
I wanted to reach out to see if I could get some assistance implementing a digital zoom & pan function on an proof of concept Python application I am working on.

Currently I am using jetson-utils (as it has been fantastic in getting me the frame rate and latency with the resolution I am aiming for), I am displaying two camera streams side by side on my screen, each stream takes up exactly half the screen:

roi_width = 3840/2
roi_height = 2160


input1 = videoSource("csi://0",options={'width':4056,'height':3040,'framerate':60})
input2 = videoSource("csi://1",options={'width':4056,'height':3040,'framerate':60})    


output = videoOutput()   


try:
  numFrames = 0
  while True:
    # capture the next image
    img = input1.Capture(format='rgb8')
    img2 = input2.Capture(format='rgb8')
    crop_border = ((img.width - roi_width) * 0.5, (img.height - roi_height) * 0.5)
    crop_roi = (int(crop_border[0]), int(crop_border[1]), int(img.width - crop_border[0]), int(img.height - crop_border[1]))

    imgOutput = jetson_utils.cudaAllocMapped(width=roi_width*2 ,
                                         height=roi_height,
                                         format=img.format)
    
    jetson_utils.cudaCrop(img, imgOutput, crop_roi)
    jetson_utils.cudaCrop(img2, imgOutput, crop_roi)



    jetson_utils.cudaOverlay(img, imgOutput, 0, 0)
    jetson_utils.cudaOverlay(img2, imgOutput, 3840/2, 0)
    
    if img is None: # timeout
        continue  
        
    if numFrames % 25 == 0 or numFrames < 15:
        Log.Verbose(f"video-viewer:  captured {numFrames} frames ({img.width} x {img.height})")
	
    numFrames += 1
	
    # render the image
    output.Render(imgOutput)
    

    # update the title bar
    output.SetStatus("Video Viewer | {:d}x{:d} | {:.1f} FPS".format(img.width, img.height, output.GetFrameRate()))
	
    # exit on input/output EOS
    if not input1.IsStreaming() or not output.IsStreaming():
        break
    
    if keyboard.is_pressed('q'):
            break  # Exit the loop if 'q' is press
finally:
    input1.Close()
    input2.Close()
    output.Close()

I was able to implement a (very poor) solution using OpenCV & Gstreamer but the performance is abysmal:

camera1_source = "nvarguscamerasrc sensor-id=0 sensor-mode=0 tnr-mode=1 tnr-strength=1 ! video/x-raw(memory:NVMM), width=4056, height=3040, format=NV12 ! nvvidconv set-timestamps=true ! video/x-raw, format=BGRx ! videoconvert ! appsink"
camera2_source = "nvarguscamerasrc sensor-id=1 sensor-mode=0 tnr-mode=1 tnr-strength=1 ! video/x-raw(memory:NVMM), width=4056, height=3040, format=NV12 ! nvvidconv set-timestamps=true ! video/x-raw, format=BGRx ! videoconvert ! appsink"

# Initialize camera capture
cap2 = cv2.VideoCapture(camera1_source, cv2.CAP_GSTREAMER)
cap1 = cv2.VideoCapture(camera2_source, cv2.CAP_GSTREAMER)

if not cap1.isOpened() or not cap2.isOpened():
    print("Error opening cameras")
    exit(1)
    
frame_width = int(cap1.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap1.get(cv2.CAP_PROP_FRAME_HEIGHT))
# Allow some time for buffers to stabilize
roi_width = int(disp_width/2)
roi_height = int(disp_height)

# Calculate the starting point for the ROI to capture the center
roi_x = int((frame_width - roi_width) / 2)
roi_y = int((frame_height - roi_height) / 2)
# Initialize zoom and pan variables
# Initialize zoom and pan variables for both cameras
zoom_level = 1.0  # Start with no zoom
pan_x, pan_y = 0, 0  # Start with no panning
pan_y2 = 55.0 #added offset to deal with camera misallignment
def process_frame(frame, zoom_level, pan_x, pan_y, disp_width, disp_height):
    # Apply zoom by resizing the frame
    zoomed_width = int(frame.shape[1] * zoom_level)
    zoomed_height = int(frame.shape[0] * zoom_level)
    zoomed_frame = cv2.resize(frame, (zoomed_width, zoomed_height), interpolation=cv2.INTER_LINEAR)

    # Calculate the centered ROI coordinates after zoom
    center_x = int(zoomed_width / 2)
    center_y = int(zoomed_height / 2)

    # Calculate half size of the display for cropping
    half_disp_width = disp_width // 4  # Divide by 4 to get half of half display
    half_disp_height = disp_height // 2

    # Calculate the ROI to crop the centered part of the zoomed frame
    start_x = max(0, center_x - half_disp_width + int(pan_x * zoom_level))
    start_y = max(0, center_y - half_disp_height + int(pan_y * zoom_level))
    end_x = start_x + 2 * half_disp_width
    end_y = start_y + 2 * half_disp_height

    # Ensure the ROI does not exceed the zoomed frame bounds
    start_x = min(max(0, start_x), zoomed_width - 2 * half_disp_width)
    start_y = min(max(0, start_y), zoomed_height - 2 * half_disp_height)
    end_x = start_x + 2 * half_disp_width
    end_y = start_y + 2 * half_disp_height

    # Crop the zoomed frame to the ROI
    roi = zoomed_frame[start_y:end_y, start_x:end_x]

    return roi

while True:
    ret1, frame1 = cap2.read()
    ret2, frame2 = cap1.read()

    if not ret1 or not ret2:
        print("Error reading frames")
        break
    
    # Process frames using the process_frame function
    roi1 = process_frame(frame1, zoom_level, pan_x, pan_y, disp_width, disp_height)
    roi2 = process_frame(frame2, zoom_level, pan_x, pan_y2, disp_width, disp_height)
    
    # Concatenate processed frames side by side
    combined_frame = cv2.hconcat([roi1, roi2])
    cv2.imshow("csi_cam", combined_frame)

    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break
    elif key == ord('z'):
        zoom_level *= 1.1  # Zoom in
    elif key == ord('x'):
        zoom_level = max(0.8, zoom_level / 1.1)  # Zoom out, ensuring not to go below 1
    elif key == ord('w'):  # Pan up
        pan_y -= 10
    elif key == ord('s'):  # Pan down
        pan_y += 10
    elif key == ord('a'):  # Pan left
        pan_x -= 10
    elif key == ord('d'):  # Pan right
        pan_x += 10

cap1.release()
cap2.release()
cv2.destroyAllWindows()

I cannot seem to wrap my head around how to approach this using the jetson-utils library, I was attempting to use cudaResize() but did not have any luck. If I could have a push in the right direction or if someone has a snippet for functions like this, I would really appreciate it.

Thank you!

Hi,
Since OpenCV functions use CPU, the bottleneck can be in CPU capability. For improving performance, a possible solution is to implement the function through NvBufSurface APIs, to use hardware converter. The hardware converter can do resizing and cropping. The functions are implemented in nvvidconv plugin and you can download the package to check source code:

Jetson Linux | NVIDIA Developer
Driver Package (BSP) Sources

For demonstration of NvBufSurface APIs, please check the samples in

/usr/src/jetson_multimedia_api

Thank you so much, I’ll look into these APIs.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.