pthread mutex lock when include cuGraphicsEGLRegisterImage call

We are currently developing software for a two camera system using the Leopard Imaging IMX274 and the Leopard Imaging three camera expansion board mounted on a TX2 development board running the 28.2 OS. Our application accepts streams from the two cameras, combines them into a single frame, which it then sends to the hardware encoder. This code was working prior to the upgrade to the 28.0 version. We see a pthread mutex lock after three frames are received from the camera. By gradually removing bits of the program, we have traced the problem to the cuGraphicsEGLRegisterImage call used to map an EGLImageKHR into cuda space. The mutex lock assertion happens once this call is added to the processing. All further accesses to the resulting CUeglFrame have been commented out, so it is just this call that is triggering the failure. Code stub is shown below.

bool ColorConsumerThread::createMagImage(void * psink, CUeglFrame * sourceframe, float mag, float aimxpix, float aimypix, int * strides)
{
CUDA_RESOURCE_DESC cudaResDesc;
CUDA_TEXTURE_DESC cudaTexDesc;
CUgraphicsResource zoomResource = NULL;
CUresult cuResult;
const char * errorString;
EGLImageKHR * sinkframe = (EGLImageKHR *)psink;

float uvx, uvy;   // center of U and V planes for 4:2:0 image
int ystride, uvstride;
uvx = (2.0f * aimxpix - 1.0f)/4.0f;
uvy = (2.0f * aimypix - 1.0f)/4.0f;
ystride = *strides;
uvstride = *(strides+1);

// Register the output frame with cuda, so it can be used as a destination buffer for the
// magnified frame.
cuResult = cuGraphicsEGLRegisterImage(&zoomResource, *sinkframe, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
if (cuResult != CUDA_SUCCESS)
{
    cuGetErrorString(cuResult, &errorString);
    fprintf(stderr, "Error: unable to register zoom buffer as graphics resource %s\n", errorString);
    return false;
}
CUeglFrame zoomEglFrame;
cuResult = cuGraphicsResourceGetMappedEglFrame(&zoomEglFrame, zoomResource, 0, 0);
if (cuResult != CUDA_SUCCESS)
{
    cuGetErrorString(cuResult, &errorString);
    fprintf(stderr, "Error: unable to get zoom frame in cuda EGL format %s\n", errorString);
    return false;
}
fprintf(stdout, "frame size : %d x %d  with format %d and %d planes \n", zoomEglFrame.width, zoomEglFrame.height, zoomEglFrame.eglColorFormat, zoomEglFrame.planeCount);
:
:

The status information from the run is as follows:

nvidia@tegra-ubuntu:~/workspace/ScopeTestbug$ ./scope_test 1
Default status: zoom 1.000000
NvPclHwGetModuleList: No module data found
NvPclHwGetModuleList: No module data found
OFParserGetVirtualDevice: NVIDIA Camera virtual enumerator not found in proc device-tree
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16---- imager: Found override file [/var/nvidia/nvcam/settings/camera_overrides.isp]. ----
Argus Version: 0.96.2 (single-process)
number of AE regions for far camera 64
Color Consumer thread ID 547537396192
cuda consumer connected to far color stream
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
875967048
842091865
Success: created Video encoder
Created NvBuffer 0 with fd 1828717628
Created NvBuffer 1 with fd 1828717629
Created NvBuffer 2 with fd 1828717630
Created NvBuffer 3 with fd 1828717631
Created NvBuffer 4 with fd 1828717634
Created NvBuffer 5 with fd 1828718022
Waiting until producers are connected…
Producers are connected, continuing…
===== MSENC blits (mode: 1) into tiled surfaces =====
Dequeued buffer with fd 1828717628
SCF: Error InvalidState: NonFatal ISO BW requested not set. Requested = 2147483647 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 653)
frame number 1
frame size : 1280 x 720 with format 0 and 3 planes
stream size : 3840 x 2160 with center 1919.50 x 1079.50 and magnification 0.415146
Dequeued buffer with fd 1828717629
frame number 2
frame size : 1280 x 720 with format 0 and 3 planes
stream size : 3840 x 2160 with center 1919.50 x 1079.50 and magnification 0.415146
Dequeued buffer with fd 1828717630
frame number 3
scope_test: pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)’ failed.
Aborted (core dumped)
nvidia@tegra-ubuntu:~/workspace/ScopeTestbug$

We can provide a reduced code sample that triggers the error, if you can let us know where to send it. We do ITAR restricted work, so can not publish the code on the forum.

Thank-you for your assistance.

Could you try the argus_camera for multiple session for to make sure these two sensor are working together normally.

I assume you mean the argus “multiSensor” sample. Yes it runs correctly. We have also verified that we can stream both cameras by running two simultaneous sessions of “argus_camera”.

Or did you mean the Multi session module in argus camera? In that case it does not work. I get a good still image on the display, but it is not active video. It reports a socket error.

Executing Argus Sample Application (argus_camera)
Argus Version: 0.96.2 (multi-process)
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 214)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 317)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState: (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 101)

To clarify a bit, we are currently running with just a single camera in order to validate the full processing pipe. So we should not be running into multi-camera problems just yet.

Could you confirm if the cudaHistogram working well.(tegra_multimedia_api/argus/sample/cudaHistogram)

Yes, cudaHistogram works fine.

I have a secondary question for you…
We are trying to write data into destination buffer from the GPU using the mapping approach given in the
03_video_cuda_enc example.
EGLImageKHR eglImage = NvEGLImageFromFd(display, fd)
CUgraphicsResource pResource;
cuGraphicsEGLRegisterImage(&pResource, eglImage, …)
CUeglFrame eglFrame;
cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);

cuGraphicsUnregisterResource(pResource);
NvDestroyEGLImage(display, eglImage);

The bug appears on the cuGraphicsEGLRegisterImage call. If we do not access the “eglImage”, we get the phtread mutex error described above. If we try to access the “eglFrame” plane data, we get a segmentation fault, although we can read and print the width, height, etc.

To work around this, we are writing the data into a temporary cuda array and copying it to the mapped NvBuffer planes using the NvBufferMemMap, NvBufferSyncforCpu, do stuff, NvBufferSyncforDevice, NvBufferMemUnmap sequence. This works and we have been able to validate the rest of our processing pipe and add a second camera. However, we take the hit of gpu to cpu copy, which we would rather not have, since this is a low latency application.

Looking at the NvBuffer utilities, we are wondering if these is a way to map the planes so we can write directly into them from our gpu kernel function. Nothing jumped out as obvious on reading the documentation and header files, but we could easily be missing something.

Does anyone have an alternate to the EGL mapping approach that would allow us to write directly into the NvBuffer planes from the gpu?

Thanks!

Hi,

Could you check dqBuffer fucntion in [tegra_multimedia_api]/samples/common/classes/NvV4l2ElementPlane.cpp?

Thanks.

I don’t understand your question. Could you elaborate a bit? Thanks!

Hi,

Sorry for the unclear comment.
Could you check if the function shared your #8 can fit your request?

Thanks.

Hi,

Sorry for the late reply. I have been pulled off this work to chase a problem with the TX2 Ethernet. It will probably be a few weeks before I get back to this work. Thanks for the suggestion, we will try it out just as soon as possible.

We would like to get some resolution on the bug we found. Does Nvidia have a “bug reporting” site or process beyond the forum? Is there a way to register with Nvidia, so we will be informed when you have a fix?

Thanks.

Hi carcher18a4s,

Please just file the bug you found in forum directly.
We will repro to confirm if that is, and plan the next.

Thanks

Hi,

I also face this problem when I used cuGraphicsEGLRegisterImage() in multi thread. Every thread I have to init some buffer by using below

if (-1 == NvBufferCreate(&fd, fcp.img.width, fcp.img.height,
                 NvBufferLayout_BlockLinear, NvBufferColorFormat_YUV420)) {
        dbgError("Create nvbuffer failed.\n");
        throw;
    }

    display = EGLDisplayAccessor::getInstance();
    eglImage = NvEGLImageFromFd(display, fd);

    CUgraphicsResource resource;
    CUresult status;
    cudaFree(0);
    status = cuGraphicsEGLRegisterImage(&resource, eglImage, CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD);
    if (status != CUDA_SUCCESS) {
        dbgError("cuGraphicsEGLRegisterImage failed: %d.\n", status);
        throw;
    }
    status = cuGraphicsResourceGetMappedEglFrame(&frame, resource, 0, 0);
    if (status != CUDA_SUCCESS) {
        dbgError("cuGraphicsResourceGetMappedEglFrame failed: %d.\n", status);
        throw;
    }

It would crash with the log: pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)’ failed.

Thanks.

Hi, ClancyLian

Suppose this issue is duplicate to the topic 1045151:
https://devtalk.nvidia.com/default/topic/1045151

Will track the following update on that topic.
Please correct me if they are not the same.

Thanks.