Run inference on a mix of fisheye and 'regular' cameras

• Hardware Platform - JEtson
• DeepStream Version 7
• Issue Type - Question
I would like to run multiple fisheye cameras which I split into 3 ‘virtual cameras’ each as well as multiple non-fisheye cameras which will be a single image each. Can I run both of these types of cameras into the same nvmux > nvinfer pipeline?

I’m getting a bit confused between surfaces, frames, and batches. Typically when I run non-fisheye cameras in its a single frame per camera. When I have 9 cameras I have a batch size of 9 so all the images from all the cameras get processed at once.

If I run the fisheye camera into nvdewarp and have it output 3 surfaces… I can run this into nvmux by bumping the number of surfaces per frame up to 3… but then how does this work with my other cameras that are outputing one surface. Is this possible to do and what is nvmux doing under the covers with regards to multiple surfaces (and a variable number of surfaces) and passing this on to nvinfer?

Here is some more info as I hit problems even before throwing single cameras (with a single surface) into the mix. The command I run is:

gst-launch-1.0 \

filesrc location=fisheye.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! nvdewarper source-id=0 num-output-buffers=3 config-file=config_dewarper.txt ! m.sink_0 \

filesrc location=fisheye.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! nvdewarper source-id=1 num-output-buffers=3 config-file=config_dewarper.txt ! m.sink_1 \

filesrc location=fisheye.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! nvdewarper source-id=2 num-output-buffers=3 config-file=config_dewarper.txt ! m.sink_2 \

nvstreammux name=m width=960 height=544 batch-size=3 num-surfaces-per-frame=3 ! nvinfer batch-size=3 config-file-path=/SiteBionicsBVR/SiteBionics.BVR/DeepStream/config_infer_primary_YoloV10.txt ! nvdsosd ! nvmultistreamtiler width=2880 height=1632 rows=1 columns=3 ! nveglglessink

My config_dewarper.txt is:

[property]
output-width=960
output-height=544
num-batch-buffers=3

[surface0]
#FISH_PERSPECTIVE=4
projection-type=4
surface-index=0
#dewarped surface parameters
width=960
height=544
top-angle=45
bottom-angle=-45
yaw=0
pitch=35
roll=0
focal-length=350
# Z axes corresponds to roll, X corresponds to pitch and Y corresponds to yaw
# Six combinations are possible : XYZ, XZY, YXZ, YZX, ZXY, ZYX
# Default is YXZ i.e. yaw, pitch, roll
# In this example camera first rolls by -38 degrees and then a pitch of 90 degree gets applied
#rot-axes=ZXY
rot-axes=YXZ

[surface1]
#FISH_PERSPECTIVE=4
projection-type=4
surface-index=1
#dewarped surface parameters
width=960
height=544
top-angle=45
bottom-angle=-45
yaw=0
pitch=35
roll=120
focal-length=350
# Z axes corresponds to roll, X corresponds to pitch and Y corresponds to yaw
# Six combinations are possible : XYZ, XZY, YXZ, YZX, ZXY, ZYX
# Default is YXZ i.e. yaw, pitch, roll
# In this example camera first rolls by -38 degrees and then a pitch of 90 degree gets applied
#rot-axes=ZXY
rot-axes=YXZ

[surface2]
#FISH_PERSPECTIVE=4
projection-type=4
surface-index=2
#dewarped surface parameters
width=960
height=544
top-angle=45
bottom-angle=-45
yaw=0
pitch=35
roll=240
focal-length=350
# Z axes corresponds to roll, X corresponds to pitch and Y corresponds to yaw
# Six combinations are possible : XYZ, XZY, YXZ, YZX, ZXY, ZYX
# Default is YXZ i.e. yaw, pitch, roll
# In this example camera first rolls by -38 degrees and then a pitch of 90 degree gets applied
#rot-axes=ZXY
rot-axes=YXZ

This simulates 3 fisheye cameras with 3 virtual views into each fisheye camera produced by nvdewarp. A couple of interesting things are that I see bounding boxes for the first surface of each camera but not the second and third surfaces for each camera. I also must set my batch size to 3 rather than 9 as I thought I would need on the nvstreammux and nvinfer components or else it runs much slower like its timing out before pushing the batch through inference. Its like the mux and infer components aren’t configured to consider the extra surfaces pushed by nvdewarp

No. Currently nvstreammux does not support such mixing type of frames to be combined as a batch.

batch > frame > surface

“batch” is the combination of several frames. The AI model can handle the batch so that several frames are processed or inferenced in parallel. The frames in the “batch” have no relationship to each other.

“frame” is the image(s) which is captured in some moment.

“surface” is the image from some view of the “frame”. The “surfaces” of one frame are for the same moment.

The “batch” help the inferencing model to inference images from all cameras in parallel, the GPU usage is more efficient with batch.

It is not supported by DeepStream now.

A sample pipeline with DeepStream test video and your dewarper config file:

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/fisheye_dist.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! nvdewarper source-id=0 num-output-buffers=3 config-file=config_dewarper.txt ! m.sink_0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/fisheye_dist.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! nvdewarper source-id=1 num-output-buffers=3 config-file=config_dewarper.txt ! m.sink_1 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/fisheye_dist.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! nvdewarper source-id=2 num-output-buffers=3 config-file=config_dewarper.txt ! m.sink_2 nvstreammux name=m width=960 height=544 batch-size=9 num-surfaces-per-frame=3 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! nvmultistreamtiler width=2880 height=1632 rows=1 columns=3 ! nvdsosd ! nveglglessink

Thanks for the reply and the clarifications. I have the fisheye cameras with three virtual views (surfaces generated from nvdewarp) working fine all the way through inference and tracking. I’m clear on batches > frames > surfaces now.

I don’t want to give up having a single Jetson device working on both camera types. I want to brainstorm through a few options for accommodating the cameras that only produce one image. A couple of ideas:

  1. Run two pipelines. One with a muxer and inference configuration for the fisheye (multi surface) cameras and one for the single surface cameras. Is this supported? How would the different inference nodes share the gpu?

  2. Write a component to pack the single surfaces from different cameras into a single frame. Deal with the timing slop of the 3 cameras being slightly off. So 3 single frame cameras packed and fed into the same mux as the fisheye (3 surface) cameras. The dewarp component has the guts to do this but doesn’t take in multiple source cameras. Is the dewarp component open source? I’m not sure how tracking would work with this. I’m assuming tracking works on source_id + surface_index pairs so this should work.

  3. Dewarp a single frame cameras into a 3 surface frame but somehow collect 3 different video frames from the same camera into 3 surfaces of the same output frame. As you mentioned a frame represents the same point in time so I would have to account for this as it goes into and out of the pipeline. This would make frame dropping logic more complex and I like it the least. If tracking works on source_id + surface_index pairs this may not work into tracking.

I like option 2 the best.

It is possible to run two separated pipelines.

What do you mean by “share the gpu”? There is only one gpu in one Jetson device, all GPU apps share the same gpu when running.

No. It is proprietary.

Tracker works on objects, the source_id will help you to identify which source the objects belong to.