Processing 4K Images with DeepStream on Jetson

• Hardware Platform (Jetson / GPU) Jetson Orin Nano
• DeepStream Version 7.0
• JetPack Version (valid for Jetson only) 6.0
• Issue Type( questions, new requirements, bugs) Question

Hi, I’m working on a DeepStream pipeline to process 4 4K (3840x2160) streams with a YOLOv8 model.
Ideally, I’d like to avoid losing information by compressing video streams and use an inference size of 640x360. If I cut each stream in this format, it would make 36 tiles and that seems more excessive. I could group them in batches of 36 to speed up inference, but I’m not sure my Jetson Orin Nano would be powerful enough for that. It would be good to get around 15-20 FPS.
What’s your opinion on this? Is such a treatment possible on the Jetson? Are there other, more optimized solutions?

Thanks in advance and have a nice day.

  1. from the product introduce doc, Orin Nano only support 2x 4K30 (H.265) decoding. what is your 4k’ s fps? which sample are you testing or referring to?
  2. did you try inference on 4k source without cut? are the results acceptable? 1920x1080 is not big, you can cut 4k to 2x2 tiles, please refer to deepst
    ream-preprocess-test sample, you only need to set ROIs in nvpreprocess’s configuration file. please also refer to this link for performance improvement.

I am at the beginning of my tests, and when I run the DeepStream test samples, it seems to automatically downsize my streams since I am using a 640x640 YOLOv8 model.

Could you explain what the 2x 4K30 value in the documentation means ? Is this a hardware limitation, or should I be able to process streams on DeepStream one by one or two at a time ? If it is a software limitation, do we have enough power to run CNN models with these 2 4K30 streams ?

Thanks for your time.

the model 's dimension is 640x640. all input data will be scaled to 640x640 for inference.
it is hardware limitation. video decoding uses the special hardware NVDEC, please find it in the Orin technical brief. CNN model inference is on GPU. it is OK to run decoding and inference at the same time. you can use jtop command-line to monitor the resource utilization.

That’s why I’d like to divide my streams into 640x640 tiles. Do you know if this limitation allows us to connect four 4K30 RTSP sources (or other input sinks) in DeepStream and process them one by one while still being able to handle each stream at full resolution?

you can use simplified gst-launch command to check if the Orin nano can support four 4K30 decoding at the same time. even there is no resource limitation error, the each fps of streams should be more or less lower that 30.

what do you mean about “handle each stream at full resolution”? nvstreammux will convert the source to a specific resolution, how do you set the nvstreammux 's width and height? if you want to nvpreprocess’s ROI, the ROI area will be scaled to model dimension for inference. if you don’t want to use nvdspreprocess’s ROI. the whole frame(the with/height is the same with nvstreamm’s with/height) will be scaled to model dimension for inference.

Well, if I use a nvstreammux with my streams without downsizing their dimensions, this would create a 7680x4320 stream am I correct ? Then, I could create one (or many ?) ROI but with this pipeline configuration I should not be able to manage 4 full resolution streams according to the doc…

So I’m wondering if it’s possible to have a mux that lets one or two streams through at a time, in order to process them separately and respect technical limitations. Is this possible, or does the limitation come directly from the pipeline input with rtspsrc and h264parse ?

no, 4k’s resolution is 4096×2160.
if the 4K source is the local file, you can use appsrc to control the fps. please refer to the following solution.
appsrc(4k,15fps)->decoder->|
appsrc(4k,15fps)->decoder->|
appsrc(4k,15fps)->decoder->|
appsrc(4k,15fps)->decoder->| nvstreammmux(bath-size=4, w,h=3840x2160) → nvdspreprocess(each source have 4 ROIs(1920x1080)) → nvinfer ->…

if the 4K source is rtsp type, the decoder maybe can’t support decoding four 4K30, please use jtop to check the decoder utilization. you can lower the source’s fps.


Here is my jtop view, but I’m not sure where to find the decoder utilization. I’m currently using the Python deepStream-test3 pipeline. Do you think this solution is suitable for my application? Given that the inference size is 640x360, should I add elements between nvdspreprocess and nvinfer to minimize data loss?

please refer to this topic for how to get NVDEC utilization. for example, NVDEC 99%@998

Thanks, my NVDEC is indeed overloaded. As I have a Jetson Orin Nano I’ll start by processing 1080p streams. As I still need to be able to split the streams into several 640x360 parts, which option would be the most suitable? I’m thinking about create a tee and then crop each zone with nvvideoconvert.

  1. why do you still need to split the streams into several 640x360 parts? to avoid losing information by compressing video streams? did you try “not split the streams”? maybe converting 1080P to 640x360 has no too much affection to model detection accuracy.
  2. I suggesting using nvpreprocess because it is easy to use. please refer to sample deepstream_preprocess_test.py. you only to set four 640x360 ROIs for each 1080P sources.
  3. about " crop each zone with nvvideoconvert", you can use nvvideoconvert to crop the designated part. there is a sample to crop the 0,0,300,300 area.
gst-launch-1.0 filesrc location= /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.jpg ! jpegdec ! 'video/x-raw,format=I420' ! nvvideoconvert  src-crop=“0:0:300:300” ! jpegenc ! filesink location=7.jpeg
  1. Yes, to avoid losing information for the inference. I bought 4K cameras because I need to observe objects at a distance, that’s the advantage over a standard camera.
  2. As I need to cover the entire area, I’m not sure that 4 ROI are enough.

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

Well, the best architecture I’ve found is to connect a tee to each source to divide the stream into 9. Then I isolate the desired part with nvvideoconvert and gather all these tiles with a batch-size=36 nvstreammux before sending the whole thing to the pgie.

It works but I have a delay of 7 seconds, which should be improved… What do you think of this architecture, would adding a pgie in parallel help?

  1. if using RTSP protocol, the bitrate of 4k is too big. did you check if there is the network receiving issue?
  2. do you mean using nvvideoconvert’s src-crop to crop the desired area, then send to nvsterammux? you can simplify the pipeline to narrow down the delay issue. for example, you can remove the pgie or reduce the tee branch(9). please refer to my comment on Aug 3. I suggest using the ready-made sample deepstream-preprocess-test.

Yes, I’ve noticed that the 4K format is too big, so I’m going to scale back my ambitions with this version of the Jetson. Thanks for your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.