Seeking Alternative to NvSciBufObjPutPixels for Lower Latency and CPU Usage

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.10.0
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other DRIVE OS 6.0.9.0 SDK

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
2.1.0
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Dear NVIDIA Developer Support Team,

I’m currently working with the NvMedia SDK (version 6.0.9.0) and have encountered performance issues with the NvSciBufObjPutPixels function in our application.

Current Implementation:

  • Input: YUV420 planar video at 1936x1220 resolution, 10 FPS
  • Using NvSciBufObjPutPixels to transfer pixel data to NvSciBuf objects in nvm_iep_sci sample
  • Observed latency: ~30ms per frame
  • High CPU utilization during execution

Here is the sample code in scibuf_utils.c

NvMediaStatus
ReadInput(
    char *fileName,
    uint32_t frameNum,
    uint32_t width,
    uint32_t height,
    NvSciBufObj bufObj,
    ChromaFormat inputFileChromaFormat,
    bool uvOrderFlag,
    uint32_t pixelAlignment)
{
    ......
    err = NvSciBufObjPutPixels(bufObj, NULL, (const void **)pBuff, pBuffSizes,
            pBuffPitches);
    if (err != NvSciError_Success) {
        LOG_ERR("NvSciBufObjPutPixels failed.");
        status = NVMEDIA_STATUS_ERROR;
        goto done;
    } else {
        status = NVMEDIA_STATUS_OK;
    } 
}

This performance overhead is impacting our real-time processing pipeline. Could you please advise if:

  1. There are alternative APIs that provide lower-latency pixel data transfer?
  2. Any recommended optimization techniques for this use case?

We’d greatly appreciate any guidance or code samples that could help reduce both latency and CPU overhead.

Thank you for your support!

Best regards,
HUANG WEIJIA

Dear @huangweijia.alex ,
Is the usecase is to encode live image stream. If so, we have DW sample(DriveWorks SDK Reference: Camera Sample) for this. Does that serve purpose?

If the usecase is to read from a file and fill NvScibuf then we don’t have any other functions. NvScibufObjPut(Get)Pixels are not expected to use in production and they are expected to use for debug purposes.

Dear NVIDIA Support Team,

Thank you for your response! To clarify my use case:

  1. Current Flow:
  • I receive I420/NV12 YUV data from an external source (Actually my input is I420/NV12 YUV data sourced from cameras, but I receive it through an intermediate layer rather than accessing the camera directly).
  • Manually copy data to an intermediate buffer via memcpy.
  • Call NvSciBufObjPutPixels() to transfer to bufObj (used for IEP encoding).
  1. Performance Concern:
  • NvSciBufObjPutPixels() adds ~30ms latency (unacceptable for real-time).
  • As you mentioned, this API is debug-only. What is the recommended production alternative?
  1. Key Questions:
  • If I must start from I420/NV12 data, is NvSciBufObjPutPixels() the only option?
  • Are there zero-copy methods to directly wrap external YUV data into NvSciBufObj?
  1. Current Development Context:
  • Using samples from nv-driveos-repo-sdk-linux-6.0.9.0-35041135_6.0.9.0_amd64.deb, the guide link is driveos 6.0.9 release
  • Cannot locate DriveWorks Camera Sample in this package - Could you confirm where to access these?
  1. Additional Question :
    Can we use NvSciBufImage_PitchLinearType for both IEP encoder creation and NvSciBufObj memory allocation, then copy YUV data via NvSciBufObjGetCpuPtr + memcpy before encoding? Would this be a valid production alternative to NvSciBufObjPutPixels ?

FYI, Issues on DRIVE OS 6.0.9 is not supported via forum. Please reach your NVIDIA representative for right support channel. The debian you shared just installs DRIVE OS and you need Driveworks deb files to install DW(https://developer.nvidia.com/docs/drive/drive-os/6.0.9/public/drive-os-linux-sdk/external/drive-quickstart-guide/dita/common/topics/installation/debian-packages/nvidia-debian-packages-linux.html)

Could you explain how you are receiving camera frame data in your use case?

Thanks for your reply! Here is how i receive frame data

Data Acquisition Pipeline:

  1. Input Source:
    Receives NV12-format video frames through an intermediate abstraction layer (not direct camera access). The raw YUV data is packaged in a std::unique_ptr<unsigned char[]> buffer.

  2. Frame Handling:

  • Constructs video frames with metadata (resolution: WxH, pixel format: NV12, timestamps)
  • Uses NvSciBufObjPutPixels for buffer transfer (acknowledged as debug-only)
  1. Architecture Flow:
    Camera → Intermediate Layer → SDK → NvSciBuf Processing → Encoder

the sample code is like this, where imageData is the input data

void ProcessNV12Frame(std::unique_ptr<uint8_t[]>& imageData, int width, int height) {
    VideoFrame frame {
        .buffer_type = RAW_MEMORY,
        .format = NV12,
        .planes = {
            {imageData.get(), width},          // Y plane
            {imageData.get() + width*height, width} // UV plane
        },
        .width = width,
        .height = height
    };
    engine->PushFrame(frame);  // Forward to processing
}

I’m considering using the NvMedia 2D hardware accelerator for the data transfer and layout conversion, instead of NvSciBufObjPutPixels . The proposed approach is (refer to image_2d.c):

  1. Source NvSciBufObj (srcBufObj):
  • Allocate a NvSciBufObj configured with NvSciBufImage_PitchLinearType layout and NeedCpuAccess = true.
  • Get raw YUV data and memcpy it directly into this srcBufObj (via NvSciBufObjGetCpuPtr).
  1. Destination NvSciBufObj (dstBufObj):
  • Allocate another NvSciBufObj configured with NvSciBufImage_BlockLinearType layout, as required by the IEP. This will be the buffer fed to NvMediaIEPFeedFrame.
  1. NvMedia 2D for Transfer & Conversion:
  • Initialize an NvMedia2D instance.
  • Register both srcBufObj and dstBufObj with the NVM2D instance.
  • Use NvMedia2DCompose to transfer data from srcBufObj to dstBufObj. The NVM2D hardware would handle the conversion from pitch-linear to block-linear layout.

Question:
Is this a viable and recommended approach to offload the CPU-intensive NvSciBufObjPutPixels work to the NVM2D hardware for preparing IEP input buffers?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.