Intermittent crash in EGLStream::FrameConsumer::create called from 09_camera_jpeg_capture because of apparent Argus Specification violation

Hi,
It appears that 09_camera_jpeg_capture from Jetson_Multimedia_API_R35.1.0_aarch64
violates requirement from “Argus 0.99 API Specification” in Argus.0.99.pdf that
“Within an Argus application, all captures on a single session must be performed by a single thread”
but 09_camera_jpeg_capture/main.cpp creates 2 threads: previewConsumerThread and captureConsumerThread, which simultaneously call EGLStream::FrameConsumer::create and crash intermittently.
To reproduce install systemd-coredump, then:

for i in {1…1000}
do
DISPLAY=:0 /home/me/tests/jetson_multimedia_api/samples/09_camera_jpeg_capture/camera_jpeg_capture
done
coredumpctl
You will see numerous crashes similar to this one:
Mon 2022-11-07 17:05:51 PST 93969 1000 1000 11 present /home/me/tests/jetson_multimedia_api/samples/09_camera_jpeg_capture/camera_jpeg_capture
I saw about 7 crashes per 1000 runs.
coredumpctl bt
#0 0x0000000000000000 in ?? ()
#1 0x0000ffffbbbf77b8 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
#2 0x0000ffffbbbf7974 in EGLStream::FrameConsumer::create(void*, void*, Argus::Status*) () from /usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so
#3 0x0000aaaab8d466b8 in ArgusSamples::ConsumerThread::threadInitialize() ()
#4 0x0000aaaab8d47090 in ArgusSamples::CaptureConsumerThread::threadInitialize() ()
#5 0x0000aaaab8d8b888 in ArgusSamples::Thread::threadFunction() ()
#6 0x0000aaaab8d8b810 in ArgusSamples::Thread::threadFunctionStub(void*) ()
#7 0x0000ffffbd727624 in start_thread (arg=0xaaaab8d8b7f4 ArgusSamples::Thread::threadFunctionStub(void*)) at pthread_create.c:477
#8 0x0000ffffbb8ef49c in thread_start () at …/sysdeps/unix/sysv/linux/aarch64/clone.S:78

Clearly, calling EGLStream::FrameConsumer::create from multiple threads on the same iCaptureSession is illegal.
It can be fixed by moving that initialization to the main thread.
But it is not clear whether the rest of the code is correct, in particular whether calling iFrameConsumer->acquireFrame from multiple threads is OK or not.

Could you, please, rewrite jetson_multimedia_api samples to comply with specification and explain which APIs can be called from multiple threads and which cannot.

Thank you

Hi,
Do you hit the issue by running default sample? Or add a patch? The default sample is verified and should work fine.

I just downloaded clean Jetson_Multimedia_API_R35.1.0_aarch64.tbz2, extracted and retested:
mkdir ~/tests/Jetson_Multimedia_API_R35.1.0_aarch64
cd ~/tests/Jetson_Multimedia_API_R35.1.0_aarch64
tar -xvf ~/tests/Jetson_Multimedia_API_R35.1.0_aarch64.tbz2
cd ~/tests/Jetson_Multimedia_API_R35.1.0_aarch64/usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture
make
for i in {1…1000}
do
DISPLAY=:0 ~/tests/Jetson_Multimedia_API_R35.1.0_aarch64/usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture/camera_jpeg_capture
done
coredumpctl
Sure, several crush dumps, all like I copied above.

The default sample is verified and should work fine.

It could not have worked fine because it clearly violates requirement “Within an Argus application, all captures on a single session must be performed by a single thread” but 09_camera_jpeg_capture calls EGLStream::FrameConsumer::create from 2 threads on the same iCaptureSession and later calls iFrameConsumer->acquireFrame again on 2 threads.

Hi,
We try the script and do not hit the issue:

#!/bin/bash

export DISPLAY=:0
i=1
while [ "$i" != "100" ]
do
    echo "loop" $i
    /usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture/camera_jpeg_capture
    sleep 1
    i=$(($i+1))
done
coredumpctl

The issue you rare facing is more like that camera sensor is not stable and it fails to be launched sometimes, triggering the error. Suggest check stability of the camera sensor.

  1. I just reproduced crash with different camera (imx274)
  2. 100 repeats is not enough. Try 1000 or 10000 because multi threading race conditions are intermittent in nature.
    3, Again, your code clearly violates your own requirements. Please, explain this.
    4, Even if you are right and crash was caused by a faulty hardware, your code must never crash. We were planning to use Orin for a mission critical application and it must be proven not to crash.

Were you able to reproduce the problem? Any chance of fixing it?

Hi,
We don’t observe it with our setup. Seems to be specific to the camera sensor. Could you share which camera sensor you are using? We have camera partners and not sure if you use the camera from our partner.

Hi,
I already wrote that I reproduced it with 2 different sensors: Originally I used an Omnivision sensor, but then reproduced with IMX274 to make easier for you since it uses driver provided by Nvidia.
The problem is not related to sensor, but to a bug in 09_camera_jpeg_capture sample code, which illegally invokes EGLStream::FrameConsumer::create from 2 threads simultaneously, which causes a race condition and null pointer access.
If you will show this to any of your developers, they will understand what I am talking about
Thank you

Hi,
The two samples demonstrate multi-stream use-case:

/usr/src/jetson_multimedia_api/argus/samples/multiStream
/usr/src/jetson_multimedia_api/samples/09_camera_jpeg_capture

Argus supports multi streams, but it may not be stable in certain corner case. If you can run stably in single stream, please create single stream and call createNvBuffer() and copyToNvBuffer(). So that it can work like

1920x1080 YUV420 NvBuffer -> 640x480 RGBA NVBuffer
                          -> 1280x720 YUV420 NvBuffer

This should work similar to creating multi streams. Please give it a try.

Hi
I found that
jetson_multimedia_api/samples/09_camera_jpeg_capture/main.cpp
still crashes during initialization and this is not a corner case,
But
jetson_multimedia_api/argus/samples/multiStream/main.cpp
did not crash so far because it is different from 09_camera_jpeg_capture, it calls:
PROPAGATE_ERROR(previewConsumerThread.initialize());
PROPAGATE_ERROR(previewConsumerThread.waitRunning());
before
PROPAGATE_ERROR(jpegConsumerThread.initialize());
So, that EGLStream::FrameConsumer::create is not called simultaneously by 2 threads, like it is in 09_camera_jpeg_capture/main.cpp

please create single stream and call createNvBuffer() and copyToNvBuffer().

09_camera_jpeg_capture/main.cpp already calls acquireFrame/createNvBuffer/copyToNvBuffer on 2 threads simultaneously.
Are you saying that 09_camera_jpeg_capture/main.cpp should be changed
so that one thread calls acquireFrame() and then calls copyToNvBuffer() twice
and another thread does not call acquireFrame() at all?

Could you, please, post a valid version of 09_camera_jpeg_capture/main.cpp for reference?

Thank you

Hi,
You can enable this option:

–disable-jpg Disable JPEG encode [Default Enable]

To run 09 sample as single stream.