pthread mutex lock when include cuGraphicsEGLRegisterImage call

carcher18a4s · July 11, 2018, 5:29pm

We are currently developing software for a two camera system using the Leopard Imaging IMX274 and the Leopard Imaging three camera expansion board mounted on a TX2 development board running the 28.2 OS. Our application accepts streams from the two cameras, combines them into a single frame, which it then sends to the hardware encoder. This code was working prior to the upgrade to the 28.0 version. We see a pthread mutex lock after three frames are received from the camera. By gradually removing bits of the program, we have traced the problem to the cuGraphicsEGLRegisterImage call used to map an EGLImageKHR into cuda space. The mutex lock assertion happens once this call is added to the processing. All further accesses to the resulting CUeglFrame have been commented out, so it is just this call that is triggering the failure. Code stub is shown below.

bool ColorConsumerThread::createMagImage(void * psink, CUeglFrame * sourceframe, float mag, float aimxpix, float aimypix, int * strides)
{
CUDA_RESOURCE_DESC cudaResDesc;
CUDA_TEXTURE_DESC cudaTexDesc;
CUgraphicsResource zoomResource = NULL;
CUresult cuResult;
const char * errorString;
EGLImageKHR * sinkframe = (EGLImageKHR *)psink;

float uvx, uvy;   // center of U and V planes for 4:2:0 image
int ystride, uvstride;
uvx = (2.0f * aimxpix - 1.0f)/4.0f;
uvy = (2.0f * aimypix - 1.0f)/4.0f;
ystride = *strides;
uvstride = *(strides+1);

// Register the output frame with cuda, so it can be used as a destination buffer for the
// magnified frame.
cuResult = cuGraphicsEGLRegisterImage(&zoomResource, *sinkframe, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
if (cuResult != CUDA_SUCCESS)
{
    cuGetErrorString(cuResult, &errorString);
    fprintf(stderr, "Error: unable to register zoom buffer as graphics resource %s\n", errorString);
    return false;
}
CUeglFrame zoomEglFrame;
cuResult = cuGraphicsResourceGetMappedEglFrame(&zoomEglFrame, zoomResource, 0, 0);
if (cuResult != CUDA_SUCCESS)
{
    cuGetErrorString(cuResult, &errorString);
    fprintf(stderr, "Error: unable to get zoom frame in cuda EGL format %s\n", errorString);
    return false;
}
fprintf(stdout, "frame size : %d x %d  with format %d and %d planes \n", zoomEglFrame.width, zoomEglFrame.height, zoomEglFrame.eglColorFormat, zoomEglFrame.planeCount);
:
:

The status information from the run is as follows:

nvidia@tegra-ubuntu:~/workspace/ScopeTestbug$ ./scope_test 1
Default status: zoom 1.000000
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: No module data found
OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
Argus Version: 0.96.2 (single-process)
number of AE regions for far camera 64
Color Consumer thread ID 547537396192
cuda consumer connected to far color stream
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
Success: created Video encoder
Created NvBuffer 0 with fd 1828717628
Created NvBuffer 1 with fd 1828717629
Created NvBuffer 2 with fd 1828717630
Created NvBuffer 3 with fd 1828717631
Created NvBuffer 4 with fd 1828717634
Created NvBuffer 5 with fd 1828718022
Waiting until producers are connected…
Producers are connected, continuing…
===== MSENC blits (mode: 1) into tiled surfaces =====
Dequeued buffer with fd 1828717628
SCF: Error InvalidState: NonFatal ISO BW requested not set. Requested = 2147483647 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 653)
frame number 1
frame size : 1280 x 720 with format 0 and 3 planes
stream size : 3840 x 2160 with center 1919.50 x 1079.50 and magnification 0.415146
Dequeued buffer with fd 1828717629
frame number 2
frame size : 1280 x 720 with format 0 and 3 planes
stream size : 3840 x 2160 with center 1919.50 x 1079.50 and magnification 0.415146
Dequeued buffer with fd 1828717630
frame number 3
scope_test: pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)’ failed.
Aborted (core dumped)
nvidia@tegra-ubuntu:~/workspace/ScopeTestbug$

We can provide a reduced code sample that triggers the error, if you can let us know where to send it. We do ITAR restricted work, so can not publish the code on the forum.

Thank-you for your assistance.

ShaneCCC · July 12, 2018, 7:23am

Could you try the argus_camera for multiple session for to make sure these two sensor are working together normally.

carcher18a4s · July 12, 2018, 3:22pm

I assume you mean the argus “multiSensor” sample. Yes it runs correctly. We have also verified that we can stream both cameras by running two simultaneous sessions of “argus_camera”.

carcher18a4s · July 12, 2018, 3:51pm

Or did you mean the Multi session module in argus camera? In that case it does not work. I get a good still image on the display, but it is not active video. It reports a socket error.

Executing Argus Sample Application (argus_camera)
Argus Version: 0.96.2 (multi-process)
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 214)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 317)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 101)

To clarify a bit, we are currently running with just a single camera in order to validate the full processing pipe. So we should not be running into multi-camera problems just yet.

ShaneCCC · July 13, 2018, 5:55am

Could you confirm if the cudaHistogram working well.(tegra_multimedia_api/argus/sample/cudaHistogram)

carcher18a4s · July 13, 2018, 3:30pm

Yes, cudaHistogram works fine.

carcher18a4s · July 13, 2018, 11:28pm

I have a secondary question for you…
We are trying to write data into destination buffer from the GPU using the mapping approach given in the
03_video_cuda_enc example.
EGLImageKHR eglImage = NvEGLImageFromFd(display, fd)
CUgraphicsResource pResource;
cuGraphicsEGLRegisterImage(&pResource, eglImage, …)
CUeglFrame eglFrame;
cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);

cuGraphicsUnregisterResource(pResource);
NvDestroyEGLImage(display, eglImage);

The bug appears on the cuGraphicsEGLRegisterImage call. If we do not access the “eglImage”, we get the phtread mutex error described above. If we try to access the “eglFrame” plane data, we get a segmentation fault, although we can read and print the width, height, etc.

To work around this, we are writing the data into a temporary cuda array and copying it to the mapped NvBuffer planes using the NvBufferMemMap, NvBufferSyncforCpu, do stuff, NvBufferSyncforDevice, NvBufferMemUnmap sequence. This works and we have been able to validate the rest of our processing pipe and add a second camera. However, we take the hit of gpu to cpu copy, which we would rather not have, since this is a low latency application.

Looking at the NvBuffer utilities, we are wondering if these is a way to map the planes so we can write directly into them from our gpu kernel function. Nothing jumped out as obvious on reading the documentation and header files, but we could easily be missing something.

Does anyone have an alternate to the EGL mapping approach that would allow us to write directly into the NvBuffer planes from the gpu?

Thanks!

AastaLLL · July 17, 2018, 8:03am

Hi,

Could you check dqBuffer fucntion in [tegra_multimedia_api]/samples/common/classes/NvV4l2ElementPlane.cpp?

Thanks.

carcher18a4s · July 17, 2018, 4:24pm

I don’t understand your question. Could you elaborate a bit? Thanks!

AastaLLL · July 23, 2018, 7:43am

Hi,

Sorry for the unclear comment.
Could you check if the function shared your #8 can fit your request?

Thanks.

carcher18a4s · August 7, 2018, 5:58pm

Hi,

Sorry for the late reply. I have been pulled off this work to chase a problem with the TX2 Ethernet. It will probably be a few weeks before I get back to this work. Thanks for the suggestion, we will try it out just as soon as possible.

We would like to get some resolution on the bug we found. Does Nvidia have a “bug reporting” site or process beyond the forum? Is there a way to register with Nvidia, so we will be informed when you have a fix?

Thanks.

kayccc · August 9, 2018, 7:34am

Hi carcher18a4s,

Please just file the bug you found in forum directly.
We will repro to confirm if that is, and plan the next.

Thanks

ClancyLian · December 10, 2018, 3:24am

Hi,

I also face this problem when I used cuGraphicsEGLRegisterImage() in multi thread. Every thread I have to init some buffer by using below

if (-1 == NvBufferCreate(&fd, fcp.img.width, fcp.img.height,
                 NvBufferLayout_BlockLinear, NvBufferColorFormat_YUV420)) {
        dbgError("Create nvbuffer failed.\n");
        throw;
    }

    display = EGLDisplayAccessor::getInstance();
    eglImage = NvEGLImageFromFd(display, fd);

    CUgraphicsResource resource;
    CUresult status;
    cudaFree(0);
    status = cuGraphicsEGLRegisterImage(&resource, eglImage, CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD);
    if (status != CUDA_SUCCESS) {
        dbgError("cuGraphicsEGLRegisterImage failed: %d.\n", status);
        throw;
    }
    status = cuGraphicsResourceGetMappedEglFrame(&frame, resource, 0, 0);
    if (status != CUDA_SUCCESS) {
        dbgError("cuGraphicsResourceGetMappedEglFrame failed: %d.\n", status);
        throw;
    }

It would crash with the log: pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)’ failed.

Thanks.

AastaLLL · December 14, 2018, 3:32am

Hi, ClancyLian

Suppose this issue is duplicate to the topic 1045151:
[url]https://devtalk.nvidia.com/default/topic/1045151[/url]

Will track the following update on that topic.
Please correct me if they are not the same.

Thanks.

Topic		Replies	Views
pthread mutex lock when include cuGraphicsEGLRegisterImage call Jetson TX2	44	3990	October 18, 2021
Consuming an EGLStream from CUDA causes memory bloat Jetson TX2	41	4405	September 10, 2018
GPU error when capturing and rendering images from multiple cameras Jetson TX2	5	854	October 18, 2021
Libargus crashing with cuda-openGL interop Jetson TX1	17	2325	December 27, 2017
How to share NvBufSurface with Cuda efficiently, without overhead of cuGraphicsEGLRegisterImage/cuGraphicsUnregisterResource? Jetson AGX Orin nvbugs	24	1546	May 13, 2024
cuGraphicsEGLRegisterImage, cuGraphicsResourceGetMappedEglFrame, NvBufSurfaceMapEglImage are getting slower after every call DeepStream SDK	3	137	July 24, 2024
Exception in cuGraphicsEGLRegisterImage Jetson AGX Orin mmapi	3	726	June 21, 2023
cuGraphicsEGLRegisterImage makes gstreamer plugin not recognized Jetson Nano cuda , gstreamer	3	1256	October 15, 2021
Access Violation caused by "cudaGraphicsMapResources" CUDA Programming and Performance	5	1930	September 20, 2018
Is it possible not to use cuGraphicsEGLRegisterImage for every frame? Jetson Nano cuda	5	187	July 26, 2024

pthread mutex lock when include cuGraphicsEGLRegisterImage call

Related topics