Issue with switching NvBufferTransform() for NvBufferTransformAsync()

Im trying to introduce concurrency to my image processing pipeline. I beleive the pipeline in its synchronous state worked correctly. The color space tranforms were working and the cuda kernel was being applied as expected.

My Pipeline: FrameCapture → NvBufferTransform → Cuda Kernel → NvBufferTransform → Encoder.
(Based on sample apps 10_camera_recording and 03_video_cuda_enc )

My approach was to switch out the first synchronous transform for the asynchronous transform in a manner that keeps the full pipeline synchronous as a simple first step to prove I could used the async functions correctly. I switched out the NvBufferTransform() call for the code below.

(In both the sync and aysnc cases Im using NvBufferMemMap and NvBufferMemSyncForDevice after the transform is complete. Although im not exactly sure if this is required. )

// Convert input NV12 frame to ARGB32 asynchronously, storing in pre-allocated argbFd.
int ret = NvBufferTransformAsync(inputDmabufFd, argbFd, &transParams, &syncobj);
if (ret < 0) {
    ORIGINATE_ERROR("Failed to start async NV12 to ARGB transformation");
}

// Wait for transformation to complete, ensuring blocking behavior
ret = NvBufferSyncObjWait(&syncobj.outsyncobj, NVBUFFER_SYNCPOINT_WAIT_INFINITE);
if (ret < 0) {
    ORIGINATE_ERROR("Failed to wait for async transformation completion");
}

This did not produce the behaviour I expected.

The Cuda Kernel is no longer being applied consistently to each frame. The top right corner of each frame has an area where the Kernel has not been applied and the size of this area varies between each frame suggesting a timing aspect to this issue. Also the async transform is taking ~3ms as suppose to ~10ms when using the sync version. These symptoms suggest some type of syncing issue maybe?

I’m possibly missing some important aspect with regards to syncing the different stages of the pipeline.

Any help with this issue would be appreciated.

Hi,

Please follow the sample in the below link to see if it can work:

Thanks.

Hi. Thanks for the quick response!

Sorry I’m not sure what you are asking me to do. Would you like me to check my code is doing the async transform in the same way this patch is? or patch the 00_video_decode sample application and verify it works?

My code does follow the same pattern as the patch. I create the NvBufferSession, use it populate the NvBufferTransformParams object used by NvBufferTransformAsync and then destroy the NvBufferSession when the capture has completed.

Hi,
NvBuffer APIs are deprecated on Jetpack 5. Please use NvBufSurface APIs.

Latest release for Xavier is Jetpack 5.1.5. If you use previous version, would suggest upgrade to latest release and try.

Thanks for your reply.

We are not able to update to the latest release at this time unfortunately.

Is this behaviour a known issue or do you think this approach should work? Any help or suggestions would be very much appreciated.

Hi,
It may be a potential issue in NvBufferTransformAsync(). If NvBufferTransform() works, would suggest use this function with creating NvBufferSession.

Does this allow me to achieve asynchronous buffer transforms?

Hi,
Each thread has its own NvBufferSession so the tasks are scheduled to hardware engine sequentially. It does not happen that the thread which is doing NvBufferTransform() blocks other threads calling NvBufferTransform(). But in each thread, NvBufferTransform() is a blocking call.