DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE AGX Xavier
Hi! I face with a strange issue. For each frame in our pipeline we have an unique NvSciBufObj (containing the frame), an unique nvMedia2D instance (created with NvMedia2DCreate), an unique NvSciSyncObj that is registered with the frame’s nvMedia2D instance as an EOF sync object.
Before any NvMedia2D operation we set the object for the next EOF sync with NvMedia2DSetNvSciSyncObjforEOF, after the NvMedia2D operations we get a fence with NvMedia2DGetEOFNvSciSyncFence, then wait on the fence.
It works well in most of the cases.
We observed that if a queue in our pipeline gets long, then after a while we start getting strange error messages when we call NvMedia2DBlitEx and it returns 7 (NVMEDIA_STATUS_ERROR - indicates that some other error occurred):
Module_id 21 Severity 2 : sAddReadOnlyFences
Module_id 21 Severity 2 : : Couldn’t find a place to store the fence
Module_id 48 Severity 2 : Failed to add source surface fence
nvmedia: ERROR: NvMedia2DBlitEx failed: 7 at VidCropFrame():53
Then we cannot get the fence for the NvMedia2D EOF:
Module_id 45 Severity 2 : Invalid id
Module_id 24 Severity 2 : Getting nvscisyncfence information from nvrmfence failed
ERROR: Couldn’t get the fence for the ongoing NvMedia2D operation: 7
Error code 7 in the second case indicates that the function was called before
NvMedia2DBlitEx() was called, it makes sense as blit failed previously.
I am wondering what can be the limitation here? Why does NvMedia2DBlitEx fail if we have many NvMedia2D/NvSciSyncObj instances? If we have less than ~25 our pipeline works well. If we have more than ~25 we get the error I mentioned.
What do you mean by ‘a queue in our pipeline gets long’? Does this imply queue overflow, and is it possible to take measures to avoid it?
For each buffer in our pipeline we have an unique NvSciBufObj (containing the frame), an unique nvMedia2D instance (created with NvMedia2DCreate), an unique NvSciSyncObj. We have queues in the pipeline, and the queues can grow. The more the queue grows the more active buffers the pipeline uses (so the more NvSciBufObj/NvMedia2D/NvSciSyncObj we have). It is not uncontrolled and it is not an overflow (30-40 buffers are in active use is normal). But above ~25 active buffers (each with one unique NvSciBufObj/NvMedia2D/NvSciSyncObj) we start to get the error message I quoted.
The error message indicates that the application encountered a problem related to the availability of empty slots in the read fences list of the NvMedia array, which typically has 16 slots. This situation is unusual and not something we have encountered before.
Thank you for investigating the issue. I am aware of only one limitation, for every NvMedia2D handle only 16 NvSciSyncObj can be registered (documentation). This is why we create a unique NvMedia2D object for each buffer in the pipeline. So for each buffer we create one NvMedia2D object, one NvSciSyncObj object, register the NvSciSyncObj as an EOF sync object at the NvMedia2D object that belongs to the buffer, then on demand, if we call some NvMedia2D operation on the buffer we set the NvSciSyncObj for EOF, call the NvMedia2D operation, get an EOF fence and wait on it. So for each and every NvMedia2D object there is only one NvSciSyncObj and only one fence.
But if we have more than ~25 buffers (so ~25 NvMedia2D/NvSciSyncObj/fence combination), we get the error message I quoted.
So I don’t see what the limitation is in our case, as we use only one sync obj and one fence per NvMedia2D handles, and I don’t see any limitation how many NvMedia2D objects we can create.