pthread mutex lock when include cuGraphicsEGLRegisterImage call

Hi,

this problem is the same as https://devtalk.nvidia.com/default/topic/1037435/pthread-mutex-lock-when-include-cugraphicseglregisterimage-call/#5302615

when I used cuGraphicsEGLRegisterImage() in multi thread. Every thread I have to init some buffer,It would crash with the log: pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)’ failed.

or it would stuck. if not error occur, you’d better run some times.

if I don’t init buffer, it will run ok.

the demo is in the attachment. you should change the camera url yourself.

Thanks.
test.tar.gz (8.29 KB)

Hi,

you can use this makefile because the first generate from qt .pro file.

Thanks
Makefile.tar.gz (771 Bytes)

Hi,

We will check this sample and update information with you later.
Thanks.

Hi,

We try to compile your sample but meet an error with cudaEGL.h:

eglframeconsumer.h:7:10: fatal error: cudaEGL.h: No such file or directory
 #include <cudaEGL.h>
          ^~~~~~~~~~~
compilation terminated.

May I know where is this file from? Is it from tegra EGL?
Thanks.

Hi,

yes, it from /usr/local/cuda-9.0/include/ in Jetpack3.2.

thanks

Hi,

Sorry that we don’t have IP camera environment.
Would you mind to update your sample to use USB or on-board camera?

Thanks.

Hi,

can you change the code? I don’t know how to use USB or on-board camera width gstreamer. Or if there is a demo on TX2 ?

Thanks.

Hi,AastaLLL

Could you find a Ip camera.if you have no one ,tell me your address, I can send you a Ip camera By SF express

Hi,
Please modify your test sample to local video playback so that we can try to reproduce the issue.

filesrc ! decodebin ! nvvidconv ! videosink

Hi, AastaLLL,DaneLLL

I have modefied the sample to local video, change the video url yourself.

string url = "/home/nvidia/Pictures/test.mp4";

Please run the sample and tell me if you can reproduce the issue

Thx!

testVideo.tar.gz (6.3 KB)

Hi,

We meet another error when testing your application:

...
Link pads from element decodebin4 to nvvconv2.
Link pads from element decodebin0 to nvvconv3.
test: eglframeconsumer.cpp:78: CUeglFrame* EGLFrameConsumer::fetch(): Assertion `eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_RGBA' failed.
test: eglframeconsumer.cpp:78: CUeglFrame* EGLFrameConsumer::fetch(): Assertion `eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_RGBA' failed.
Aborted (core dumped)

Do you also meet this error on your environment?

Thanks.

Hi,

we would not happen this error. But you can comment the code or delete in eglframeconsumer.cpp line 77-82. It is our assertion with output format. Or you can try .mp4 file

Thanks.

Hi,

We still cannot reproduce this issue.
The application hang few seconds after executing.

NvMMLiteBlockCreate : Block : BlockType = 260 
Allocating new output: 1280x720 (x 10), ThumbnailMode = 0
OPENMAX: HandleNewStreamFormat: 3528: Send OMX_EventPortSettingsChanged: nFrameWidth = 1280, nFrameHeight = 720 
Link pads from element decodebin1 to nvvconv5.
do init.
do init.
do init.
do init.
do init.
do init.
do init.
do init.

We will test this sample on another platform to double check.
Thanks.

Hi,

the sample would stuck,your mean “hang” is “stuck”? I think it maybe " deadlock". this is just one of phenomenons, you can try more times, or create more threads, will happen “pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)’ failed.”

Thanks.

Hi,

Thanks for the information.
Let us check if we also can meet the __pthread_mutex_lock_full in the end.

Hi,
As of now we cannot reproduce the failure. Can you please try Jetpack 3.2.1 or Jetpack 3.3?
For TX2, we no longer support r28.2(Jetpack 3.2)

Hi,

We have tried in Jetpack3.3. It also can happen this phenomenon. Or you can explain why the program stuck ? Because when the program crashed, we can restart it, but when the program stuck, we can not know if the program is abnormal and restart it.

Thanks.

Hi,

There is a possible cause but still need further investigation:
The deadlock occurs when tid1 read the owner id when another thread just update the owner to it.

1. Two thread competing for a userspace mutex and they only differ by the lowest byte:
Ex.
tid1 = 0xaabbcc00
tid2 = 0xaabbcc01

2. Scenario
STEP1. Thread2 tid=0xaabbcc01 acquires the mutex
STEP2. Thread1 tid=0xaabbcc00 attempts to acquire mutex
STEP3. libpthread detects the mutex is not free
STEP4. libpthread invokes FUTEX_LOCK_PI from tid=0xaabbcc00
STEP5. Thread2 tid=0xaabbcc01 releases the mutex

3. Race occurs
STEP1. mutex tid is 0xaabbcc01
STEP2. CPU X can start reading byte-by-byte, the userspace mutex owner field
STEP3. CPU Y updates the 4 bytes of mutex tid to 0x00000000 <- update between CPUX read the owner field
STEP4. CPU X reads remaining bytes of new value

We are still checking if any possible solution internally. Will update with you later.

Thanks.

Hi,AastaLLL

we have found another problem about cuEGL and release in https://devtalk.nvidia.com/default/topic/1045889/cueglstreamconsumerconnect-and-cueglstreamconsumerdisconnect-function-would-stuck-in-multithread-frequent-call-/#5307106

in multithread, the function cuEGLStreamConsumerConnect and cuEGLStreamConsumerDisconnect may also hang when we add and delete it frequently. In our project, we may add camera or delete it, so we have to do this to connect and disconnect.

Thanks.

Hi,

this attachment is showing the break down phenomenon but not stuck when we use gdb to run it. It seems happen in omxh264dec ?

Thanks.