Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
Xavier AGX (also tested on Orin) • DeepStream Version
6.0 • JetPack Version (valid for Jetson only)
4.6.3 • TensorRT Version • NVIDIA GPU Driver Version (valid for GPU only) • Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I’ve modified the facial landmark app included in the deepstream_tao_apps repository to use an nvarguscamerasrc. The incoming stream is video/x-raw(memory:NVMM), 1080p, NV12 format, and 30fps.
I’ve set the nvstreammux batch size to 1, live-source is 1, and it’s resolution matches the sources.
I haven’t tweaked anything else in the application. All the probe functions stay the same and I tried rendering the result to the display (This rendering has extremely poor performance (0.3 - 1 fps) but ultimately changed it to nvrtspoutsinkbin and render it on another desktop system where I get a framerate that matches the console output.
What I notice is I get ~24fps when only 1 face is present in the screen, and then it rapidly degrades as you increase the number of faces. So 2 faces = ~12fps, 4 faces = ~6fps, etc. After some reading I saw that I could increase the batch-size of the secondary inference plugin to handle the average number of detections in each frame but this didn’t change the performance. There is also a lot of latency between what the pipeline renders and what the console outputs. Example: I will bring a picture of 3 faces into the frame and the console won’t update the Face Count print statement for ~2 seconds.
I’m not sure why I’m getting such poor performance out of this pipeline. Looking online NVIDIA has evaluated the FaceDetect model to run at 537 fps and FPENet: Facial Landmark Estimator at 1015 fps. That’s a lot of headroom!
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
CPU for the pipeline stays consistent at 77.2%, then the GPU is always jumping around a lot. However, I did notice, with 1 face it hovers between 45% - 91% with an average of about 70%. When 4 faces are present the GPU util jumps between 6% - 78% with an average around 28%. Is there a better way of measuring GPU util? Ideally I’d like a stable GPU util.
The graph isn’t super descriptive but it shows how each element is taking about 100ms of processing time. For instance: vflip is just nvvideoconvert flip-method=4 and it’s processing range takes between 40ms - 160ms!!
If I run just a simple pipeline: nvarguscamerasrc ! "video/x-raw(memory:NVMM), width=1920, height=1080, format=NV12, framerate=30/1" ! nvvideoconvert flip-method=4 ! nv3dsink
even this nvvideoconvert takes between 15ms - 40ms. I would think this is a seemingly simple operation.
Here’s the latency measurement output of the faciallandmarks pipeline adapted to use nvarguscamerasrc as an input. The input stream is 1080p 30fps. I’ve removed nvtiler.
I just ran the same 4-face video through the deepstream-faciallandmark-app on an new Orin with Jetpack5.0.1 and deepstream 6.1 and got an average fps of 25.10. Obviously better than my Xaviers but still lower than your Xavier performance.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks