nvargus-daemon freeze/hang on pipeline stop on R32.1

I need a patch fix for 32.1 since the driver isn’t supported on 32.2 (I will double check). Can you send me one?

NVIDIA has a bug. It smells like a timing one.

I need two pipelines to work not one. AFAICT, after speaking with the gstreamer team, this is entirely an issue with nvarguscamerasrc/nvargus-daemon flushing buffers on pipeline shutdown. Under no circumstances should set_state(Gst.State.NULL) cause a hang. And the fact that nvargus spews a bunch of stuff and/or crashes is not the right thing to do regardless of client scripts.

I really, really need a patch for this. Can you help me?

Hi,
The modification is significant and we have concern to offer patch. It is with more potential to harm stability.
[url]https://devtalk.nvidia.com/default/topic/1051362/jetson-tx2/bug-in-nvarguscamerasrc-with-gstrtspserver/post/5350039/#5350039[/url]
We can try to have further contact and cooperation on it.

Alright, but this is a show stopper. I literally have no way to make my application work unless you can offer me a workaround?

Is there a beta patch I could at least try just to confirm that the r32.2 patch really fixes it?

EDIT: I will confirm with the vendor (Leopard Imaging) if they have a driver available for r32.2 for my sensor as well.

I sent some email via the contact page as per the other thread.

Vendor won’t have driver till end of year at the earliest. We are officially stuck unless you can facilitate some kind of patch or work around for this bug.

Would polling the gstreamer bus work instead of relying on asynchronous messaging of the main event loop? I only suggest this since gst-launch-1.0 actually polls (but again all of this is in C which also complicates the issue if it is indeed timing related).

Hi,
We are checking it. Since the change is significant, it would take some time in verification.

Thanks! Please keep me posted.

Can I please get an update?

Hi,

we are verifying it.

I’d like an ETA on the fix and I want to verify that you are using my script (or something equivalent) to verify the patch. Thanks.

@DaneLLL: Bump?

Hi,
Please try the attachment.
Thorough tests are performed on r32.2. Still strongly suggest users upgrade to this version.
r32_1_TEST_libgstnvarguscamerasrc.zip (22.7 KB)

I’d love to but our vendor doesn’t support r32.2 yet.

Library hangs and segfaults when setting pipeline to PLAYING. It also seems to hang with one also tries to get the current pipeline state (get_state()) etc.

Can you please test my script with your patch on R32.1?

Hi,
We have verified the script. One difference is that the resolution is modified to 2592x1944 since we don’t have camera boards supporting 4K. Please check the md5sum

$ md5sum libgstnvarguscamerasrc.so
676024de317084d2e11fefb5d7d92e0a  libgstnvarguscamerasrc.so

The camera’s resolution should have nothing to do with this. This is simply nvarguscamerasrc keeping track of GST bus state correctly.

If you restart am existing pipeline that has been stopped it just crashes/hangs. So set the pipeline’s bus state to PLAYING, sleep a few seconds to record some frames, then set it to either NULL or PAUSED/READY after catching an EOS event, wait a few seconds to settle, then put it in the PLAYING state again. Boom!

NOTE: After this happens, nvargus-daemon also freezes and needs to be restarted to even initialize a new pipeline again.

I’m seeing if we recreating the pipeline from scratch on every restart is a viable workaround. But you still have some serious bugs here.

You want to reproduce this, modify my initial test script to flip the bus state back and forth a few times. The nvargus-camera daemon and application will hang/crash almost immediately.

Again, my application is a recorder program that the user can stop at any moment and restart at will. This worked fine on R28.2.1.

Recreating the pipeline from scratch every time seems to be a viable workaround. But there seems to no way to restart an existing pipeline after an EOS event is caught and the state changes to READY or NULL.

Hi,
Please share another script so that we can simply apply it to run the usecase.

https://drive.google.com/file/d/1IWqQOeVqt-sifhWBjZnCBK1bjePJtKev/view?usp=sharing

Starts two recording sessions, waits a few seconds, sends EOS, catches EOS and sets pipeline to NULL state then waits a second, restarts recording again by setting pipeline back to PLAYING, then quits.

When the pipeline is set to PLAYING for the second time I get an immediate EOS event sent to the bus. Why? Also everything after that starts to fall apart (nvargus-deamon just hangs).

NEXT:

Occasionally I’m a seeing nvargus-daemon just SEGV/hang when:

  1. Two simultaneous connections are made to the daemon
  2. The user has an error in their pipeline syntax (why does nvargus-deamon just crash if the client can’t instantiate a pipeline correctly? That’s really bad behavior).

(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 266)
(Argus) Error EndOfFile: Receive worker failure, notifying 1 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 340)
(Argus) Error InvalidState: Argus client is exiting with 1 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 357)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 368)
(Argus) Error EndOfFile: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error EndOfFile: (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 87)
WARNING Argus: 10 client objects still exist during shutdown:
548218175816 (0x7f6800dff8)
548218176168 (0x7f60008ef8)
548221039440 (0x7f68001770)
548238187840 (0x7f60000c80)
548238188000 (0x7f680017f0)
548238188208 (0x7f60000d00)
548238193328 (0x7f68001930)
548238193680 (0x7f60000e20)
548238197616 (0x7f600026a0)
548238206896 (0x7f6800df20)

This also occurs if you shutdown an application while a stream is running. If these are warnings (stale sockets), then say so otherwise this is very confusing.