Using multiple pipelines in TAO CV

Hello!
Is it possible to run two or more instances of the pipeline TAO CV API (gaze detector, emotion detector, etc.) in one program?
Separately standard examples of emotion and gaze detectors work stably even in situations where the user quickly moves in front of the camera or disappears from the camera’s field of view, or is at a distance of 3 … 5m.
I tried to combine the code of the standard gaze and emotion registration examples.
The image captured in OpenCV was copied to the first pipeline, and moved to the second, as it was in the original code.
Further in the main loop, I first processed the definition of emotions, then the look.
At the stage of determining the gaze, the program periodically crashes with an error from Jarvis about the discrepancy between the number of detected faces and the number of landmarks. I tried to check for this inconsistency and skip further processing, leaving for the next iteration of the loop. But unfortunately this did not help, the maximum that happened was sometimes to see the output of my messages about the discrepancy between the number of detected faces and the number of landmarks, if they had time to be displayed on the screen before the emergency shutdown. As far as I understand, I cannot intervene in this abnormal situation, since this error is inside the binaries.
p.s. Perhaps I am doing something wrong?
Mismatch between landmarks and face detections.txt (2.4 KB)
emotional_gaze.cpp (16.6 KB)
p.s. Thx in advance

The error is similar to the log mentioned in topic Error when running Emotion Classification Sample in TAO Computer Vision Inference Pipeline.
The inference pipeline is planned to be deprecated in next release. So, we just mentioned
deepstream inference in deepstream_tao_apps/apps/tao_others at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Yes, I saw the problem indicated in that thread, BUT, there was a problem with the use of the emotion detector, in my case, the potential problem in the detector looked. After all, completely different models are responsible for defining gaze and defining emotions.
p.s. I sincerely hope that the TAO Toolkit CV Inference Pipeline L4T will not have to be thrown away, since a lot of time and a lot of nerve cells was spent on mastering the following tools:

Sorry for the inconvenient. I’ll sync with internal team to check if there is update for this mismatching error.

The day before yesterday I tested the gaze detector separately, it worked fine for 15 … 20 minutes, but this is apparently an isolated case. Sometimes it crashes if you tilt your head forward and the detector will lose face.

Did you still have the error log?

Perhaps it reacted to the images of faces on the wall calendar - I removed the calendar, but sometimes it still crashes with the above error, the attachment contains the full output of the example of the gaze detector and the program from the topic header.
I tried to show a picture from the smartphone screen, sometimes it works fine, sometimes it crashes.
gaze_crash.txt (17.9 KB)
emotional_gaze_crash.txt (24.5 KB)

Is it possible to try below experiments?

  1. Try to run against fewer faces
  2. Try to run with recorded video files instead of camera.

What are the requirements for video recordings: resolution, bit rate, format?
I ran the standard examples using a webcam (Logitech C920) at 640x480x30fps and decoded video flag.

You can record a clip of mp4 file and config it the demo.conf file.
For example,

# Path to device handle; ensure config.sh can see this handle
#video_path=/dev/video0
video_path=/tmp/test.mp4
fps=30
# Boolean indicating whether handle is pre-recorded
#is_video_path_file=false
is_video_path_file=true

# Desired resolution (width, height, channels)
#resolution_whc=1280,720,3
resolution_whc=640,480,3

Refer to Running and Building Sample Applications — TAO Toolkit 3.0 documentation
The config files will contain the following common fields:

  • video_path : This is the device handle or an absolute path to a video
  • fps : This is the frame per second to open the video_path . Ensure your device can handle this.
  • is_video_path_file : This is a boolean true or false to indicate whether the video_path is a file.
  • resolution_whc : This is the resolution (width, height, channels) to open video_path . Ensure your device can handle this.

I ran a series of tests:

  • 4 tests using video files, incl. on files from NVIDIA containers (1 was successful for ~ 20 minutes, the other 3 crashed almost immediately);
  • 1 test using a webcam - it worked successfully for ~ 25 min. I sat almost motionless in front of the camera and periodically moved my gaze along the perimeter of the monitor and slightly turning my head (unfortunately, I could not understand the reason for the previous abnormal shutdowns of the detector).
    In all tests the flag:
    use_decoded_image_api=true

Console output, minidumps and original videos in the attached archive.
report_23-24.11.2021.zip (20.7 MB)

May I know that if you are using default(without any change) gaze application(./samples/tao_cv/demo_gaze/gaze) to test?

In my tests, I used the original gaze_demo example.
I temporarily do not use the program from the topic header, since the problem was noticed precisely in the processing of the gaze, and therefore I use gaze_demo.
The tests, the results of which were marked with success, I interrupted with Ctrl + C to go to the next one.

Thanks for the info. I will take time to try to reproduce.
More, the inference pipeline(including this gaze_demo) will not be supported in next release(very soon) .
For gaze, please run inference with deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

May I ask if you managed to reproduce the error mentioned above, and is it possible to somehow fix it?

I would also like to clarify the details of the capabilities of the software you recommend:
1.Software stability compared to TAO Toolkit Computer Vision Inference Pipeline for L4T ( TAO Toolkit CV Inference Pipeline L4T | NVIDIA NGC ) - detectors of gaze, emotions, gestures, etc .:
1) the gaze detector crashed due to a mismatch in the number of detected faces and detected landmarks, the emotion detector is the same, but a little less often;
2) the gesture detector classified any of the gestures listed in the documentation as a “random gesture” (regardless of the value of the binding to the right hand flag).
2. compatible types of cameras (webcam / CSI camera), supported pixel formats, etc.
3.Compatible JetPack firmware versions, for example:
1) Gaze Demo Container for Jetson ( Gaze Demo for Jetson/L4T | NVIDIA NGC ) required only one separate version of JetPack - 4.4 Developer Preview (DP);
2) TAO Toolkit Computer Vision Inference Pipeline for L4T ( TAO Toolkit CV Inference Pipeline L4T | NVIDIA NGC ) only started on JetPack 4.5.1 , didn’t work on JP 4.6.
4. the ability to update the OpenCV libraries built into the container by sharing the version of OpenCV installed on the host.

Sorry, I’m still checking. From your latest comment, the deepstream_tao_apps/apps/tao_others/deepstream-gaze-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub will also meet random crash, right?

I’m just rushing you to answer, because here on the forum, the topic is usually automatically closed after 2 weeks.
I previously wanted to clarify information about your proposed DeepStream software and its requirements for hardware and software.
Are there any conflicts with the already installed “TAO Toolkit Computer Vision Inference Pipeline for L4T”?
p.s. The day before yesterday I added BodyPose processing to the program from the topic header using the “TAO Toolkit Computer Vision Inference Pipeline for L4T”, but I also received an error of inconsistency between the number of detected faces and the number of landmarks (the program works for about 10 … 30 seconds and then exits with an error).
p.s. p.s. Sorry for my poor English - I have to use Google’s machine translation.