Seeking Alternative to NvSciBufObjPutPixels for Lower Latency and CPU Usage

I’m considering using the NvMedia 2D hardware accelerator for the data transfer and layout conversion, instead of NvSciBufObjPutPixels . The proposed approach is (refer to image_2d.c):

  1. Source NvSciBufObj (srcBufObj):
  • Allocate a NvSciBufObj configured with NvSciBufImage_PitchLinearType layout and NeedCpuAccess = true.
  • Get raw YUV data and memcpy it directly into this srcBufObj (via NvSciBufObjGetCpuPtr).
  1. Destination NvSciBufObj (dstBufObj):
  • Allocate another NvSciBufObj configured with NvSciBufImage_BlockLinearType layout, as required by the IEP. This will be the buffer fed to NvMediaIEPFeedFrame.
  1. NvMedia 2D for Transfer & Conversion:
  • Initialize an NvMedia2D instance.
  • Register both srcBufObj and dstBufObj with the NVM2D instance.
  • Use NvMedia2DCompose to transfer data from srcBufObj to dstBufObj. The NVM2D hardware would handle the conversion from pitch-linear to block-linear layout.

Question:
Is this a viable and recommended approach to offload the CPU-intensive NvSciBufObjPutPixels work to the NVM2D hardware for preparing IEP input buffers?