GetIspIOSurfaceFences error while acquiring frames from SIPL

Software Version
DRIVE OS Linux 5.2.6 and DriveWorks 4.0

Hardware Platform
DRIVE AGX Xavier

Hi,

After an unrelated code change we started to get random GetIspIOSurfaceFences: Failed to get fence list for output at index 0 errors while running our pipelines and the system load is high and we are using the HW encoder modules heavily. Our pipelines start well, and the error occurs suddenly, out of the blue, there is nothing interesting in the preceding logs.

If the system load is not that high this issue don’t occur.

The full log we get:

Module_id 23 Severity 2 : GetIspIOSurfaceFences: Failed to get fence list for output at index 0
Module_id 23 Severity 2 : ProcessFrame: Failed to set prefences

Module_id 23 Severity 2 : NvMediaISPProcess: Failed to process frame

SIPL_ICP_ISP_11: /dvs/git/dirty/git-master_linux/camera/fusa/sipl/src/core/pipelineMgr/spmgr/pipeline/blocks/CNvMISPBlock.cpp: 525: Process: NvMediaISPProcess failed
SIPL_ICP_ISP_11: /dvs/git/dirty/git-master_linux/camera/fusa/sipl/src/core/pipelineMgr/spmgr/pipeline/CNvMSensorPipeline.cpp: 1088: DoISPProcessing: ISP block process isp failed
[INFO] ../gst-libs/gst/nvmedia/sipl/nvsiplnotificationhandler.cpp LogEvent:84: Notification handler: ISP processing failure.

Could you please help us what Module_id 23 Severity 2 : GetIspIOSurfaceFences: Failed to get fence list for output at index 0 means? Most probably it has something to do with the NvSciSync objects we feed with camera->RegisterNvSciSyncObj, but we don’t get any synchronization related errors before the error.

Thank you in advance,
Adam

Maybe it is important, as the error log says: Module_id 23 Severity 2 : ProcessFrame: Failed to set prefences
We don’t use pre-fences with camera, we call camera->RegisterNvSciSyncObj only with NVMEDIA_EOFSYNCOBJ to set EOF sync object.

It indicates, application failed to aquire/release lock on the mutex that is used to perform fence operations on ISP0 output image buffer.

Dear @SivaRamaKrishnaNV

It is a DriveOS internal mutex, right? What can cause such an error? All we do in initialization phase are registering the EOF sync objects with camera->RegisterNvSciSyncObj and register the images with camera->RegisterImages, in the runtime phase then we just query the new INvSIPLBuffers, then get the EOF fence (buffer->GetEOFNvSciSyncFenc) and wait on it. But the error happens during ISP processing that is completely under the hood and a black box for us.

Do you know what can cause such an error?
Adam

Dear @SivaRamaKrishnaNV,

I think the issue is related to NvMedia2D/NvSciSyncObj/fence issue
According to our observation if there is more than 12 different NvMedia2D instances used throughout the whole lifespan of the application then the error mentioned in this thread appears.

Could you please confirm if there should not be such limitation?

Dear @AdamBalazsVay ,
These are 12 independent contexts or 12 tasks submitted with the same context?

12 independent context, so 12 NvMedia2D object created with NvMedia2DCreate.