Camera DMA buffer to VPIImage as efficiently as possible

I am using libargus to capture frames via DMA from a stereo camera. I am trying to construct a VPIImage from the frame, accessible to the VIC (bonus points for OFA and PVA as well) in the most efficient way possible (preferably ISP only, no CPU or VIC utilization to do so).

Potential input paths:

  1. The DMA buffer file descriptor. Not sure if there’s anything I can do with this directly, but it seems plausible that I should be able to wrap it with a VPIImage. Or perhaps first wrap it in NvBuffer, then wrap with VPIImage.

  2. An EGLStream::Frame. I can get this by configuring the argus OutputStream as STREAM_TYPE_EGL, then grabbing frames via a consumer.

    EGLStream::IFrameConsumer* consumer =
        Argus::interface_cast<EGLStream::IFrameConsumer>(cam.consumer);
    Argus::UniqueObj<EGLStream::Frame> frame(consumer->acquireFrame());

From this I can copy to NvBuffer using IImageNativeBuffer::copyToNvBuffer as is done in libargus samples. NvBuffer can be converted to VPIImage (at least theoretically, haven’t figured out the NvBuffer -->VPIImage conversion yet). The performance of this option is poor, and I haven’t even made the VPIImage yet. Copying the Frame to NvBuffer uses ~15% of the VIC per camera @30fps 1920x1200. I have 8 cameras and other algorithms I want to run on the VIC so the copy isn’t tenable.

  1. EGLImageKHR. This is the most efficient method I have found so far. I can accomplish this by configuring the stream as such:
    Argus::UniqueObj<Argus::OutputStreamSettings> stream_settings(
        isession->createOutputStreamSettings(Argus::STREAM_TYPE_BUFFER));
    auto istream_settings =
        Argus::interface_cast<Argus::IBufferOutputStreamSettings>(stream_settings);
    istream_settings->setBufferType(Argus::BUFFER_TYPE_EGL_IMAGE);

Acquiring the filled buffer returns an Argus::Buffer which can be casted to EGLImageKHR. Wrapping in VPIImage can then be done like so:

VPIImageData data;
data.bufferType = VPI_IMAGE_BUFFER_EGLIMAGE;
data.buffer.egl = egl_image;
vpiImageCreateWrapper(&data, nullptr, VPI_BACKEND_VIC | VPI_RESTRICT_MEM_USAGE, &vpi_image);

Calling the vpiImageCreateWrapper uses ~5-10% of the CPU @30fps 1920x1200. Are there any tweaks I can make to this method, perhaps to buffer allocation that will allow the buffers to be used on the VIC (or other accelerators) without any memory operations when wrapping? Maybe I am being overly optimistic but video encode can be done without the CPU, so I was hopeful I could feed images directly to VIC, OFA, etc. without the CPU as well.

Any guidance is appreciated!

Hi,

Please find below for the nvargus ↔ VPI sample:
https://elinux.org/Jetson/L4T/TRT_Customized_Example#VPI_with_Argus_Camera_-_nvarguscamerasrc

The sample wraps the VPI Image from NvBuffer.
Please note that the API is changed in VPI 2.x/3.x, but the overall wrapping approach is similar:

https://docs.nvidia.com/vpi/2.3/group__VPI__Image.html#ga3e7cf2520dd568a7e7a9a6876ea7995c

Thanks.

This uses the IImageNativeBuffer::copyToNvBuffer that I’ve already found to be inefficient. I suppose that means I’ve already figured out the optimal solution (i.e. option 3 with EGLImageKHR)?

Also, this example is even worse because it seems to allocate an entirely new buffer for the VPI image instead of wrapping (but I can’t test because I have vpi2).

Figured it out. The gst-nvarguscamera example performs poorly, but using EGLImageKHR, creating the first VPIImage with vpiImageCreateWrapper and subsequent images with vpiImageSetWrapper performs well.

Hi,

This should depend on the use case.

Since cameras usually reuse the same buffer, in some cases (ex. filtering), copying the data to another buffer is preferred.
But if your use case is a read-only process, wrapping the VPI image should be optimal.

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.