• Hardware Platform - JEtson • DeepStream Version 7 • Issue Type - Question
I would like to run multiple fisheye cameras which I split into 3 ‘virtual cameras’ each as well as multiple non-fisheye cameras which will be a single image each. Can I run both of these types of cameras into the same nvmux > nvinfer pipeline?
I’m getting a bit confused between surfaces, frames, and batches. Typically when I run non-fisheye cameras in its a single frame per camera. When I have 9 cameras I have a batch size of 9 so all the images from all the cameras get processed at once.
If I run the fisheye camera into nvdewarp and have it output 3 surfaces… I can run this into nvmux by bumping the number of surfaces per frame up to 3… but then how does this work with my other cameras that are outputing one surface. Is this possible to do and what is nvmux doing under the covers with regards to multiple surfaces (and a variable number of surfaces) and passing this on to nvinfer?
[property]
output-width=960
output-height=544
num-batch-buffers=3
[surface0]
#FISH_PERSPECTIVE=4
projection-type=4
surface-index=0
#dewarped surface parameters
width=960
height=544
top-angle=45
bottom-angle=-45
yaw=0
pitch=35
roll=0
focal-length=350
# Z axes corresponds to roll, X corresponds to pitch and Y corresponds to yaw
# Six combinations are possible : XYZ, XZY, YXZ, YZX, ZXY, ZYX
# Default is YXZ i.e. yaw, pitch, roll
# In this example camera first rolls by -38 degrees and then a pitch of 90 degree gets applied
#rot-axes=ZXY
rot-axes=YXZ
[surface1]
#FISH_PERSPECTIVE=4
projection-type=4
surface-index=1
#dewarped surface parameters
width=960
height=544
top-angle=45
bottom-angle=-45
yaw=0
pitch=35
roll=120
focal-length=350
# Z axes corresponds to roll, X corresponds to pitch and Y corresponds to yaw
# Six combinations are possible : XYZ, XZY, YXZ, YZX, ZXY, ZYX
# Default is YXZ i.e. yaw, pitch, roll
# In this example camera first rolls by -38 degrees and then a pitch of 90 degree gets applied
#rot-axes=ZXY
rot-axes=YXZ
[surface2]
#FISH_PERSPECTIVE=4
projection-type=4
surface-index=2
#dewarped surface parameters
width=960
height=544
top-angle=45
bottom-angle=-45
yaw=0
pitch=35
roll=240
focal-length=350
# Z axes corresponds to roll, X corresponds to pitch and Y corresponds to yaw
# Six combinations are possible : XYZ, XZY, YXZ, YZX, ZXY, ZYX
# Default is YXZ i.e. yaw, pitch, roll
# In this example camera first rolls by -38 degrees and then a pitch of 90 degree gets applied
#rot-axes=ZXY
rot-axes=YXZ
This simulates 3 fisheye cameras with 3 virtual views into each fisheye camera produced by nvdewarp. A couple of interesting things are that I see bounding boxes for the first surface of each camera but not the second and third surfaces for each camera. I also must set my batch size to 3 rather than 9 as I thought I would need on the nvstreammux and nvinfer components or else it runs much slower like its timing out before pushing the batch through inference. Its like the mux and infer components aren’t configured to consider the extra surfaces pushed by nvdewarp
No. Currently nvstreammux does not support such mixing type of frames to be combined as a batch.
batch > frame > surface
“batch” is the combination of several frames. The AI model can handle the batch so that several frames are processed or inferenced in parallel. The frames in the “batch” have no relationship to each other.
“frame” is the image(s) which is captured in some moment.
“surface” is the image from some view of the “frame”. The “surfaces” of one frame are for the same moment.
The “batch” help the inferencing model to inference images from all cameras in parallel, the GPU usage is more efficient with batch.
Thanks for the reply and the clarifications. I have the fisheye cameras with three virtual views (surfaces generated from nvdewarp) working fine all the way through inference and tracking. I’m clear on batches > frames > surfaces now.
I don’t want to give up having a single Jetson device working on both camera types. I want to brainstorm through a few options for accommodating the cameras that only produce one image. A couple of ideas:
Run two pipelines. One with a muxer and inference configuration for the fisheye (multi surface) cameras and one for the single surface cameras. Is this supported? How would the different inference nodes share the gpu?
Write a component to pack the single surfaces from different cameras into a single frame. Deal with the timing slop of the 3 cameras being slightly off. So 3 single frame cameras packed and fed into the same mux as the fisheye (3 surface) cameras. The dewarp component has the guts to do this but doesn’t take in multiple source cameras. Is the dewarp component open source? I’m not sure how tracking would work with this. I’m assuming tracking works on source_id + surface_index pairs so this should work.
Dewarp a single frame cameras into a 3 surface frame but somehow collect 3 different video frames from the same camera into 3 surfaces of the same output frame. As you mentioned a frame represents the same point in time so I would have to account for this as it goes into and out of the pipeline. This would make frame dropping logic more complex and I like it the least. If tracking works on source_id + surface_index pairs this may not work into tracking.