We have following pipeline, where we are processing video from each camera on GPU/CPU and then selecting output from either camera to be encoded.
Cam1 -->--> GPU --> CPU (frame processing, and data annotation ) -->
|
|--> select either cam source --> encoder --> encoded bitstream
|
Cam2 -->--> GPU --> CPU (frame processing, and data annotation ) -->
When we operate each camera at 1080/30 - some of our CV (aaNewOCVConsumerThread, aaCamCaptureThread) routines eat up significant percentage of cycles. However when we operate each camera at 1080/60, then, some of nvidia resource manager API seem to take significant percentage of cycles (report from nvidia system profiler 3.9 attached).
Could someone please help - get to bottom of this ? Why nvidia resource manager takes long cycles ? What is going on in v4l2convert.so ? Is there a way to avoid some sort of conversion going on here ?
Hi,
You may use any free online services. We suggest you break down the pipeline to clarify which stage the high loading of video converter comes from.
Hi,
v4l2convert.so is a 3rdpary prebuilt library in v4l2 frameworks, not owned by NVIDIA. From the result, it looks to be triggered by high frame rate. In 60fps, you will see 60 capture request via v4l2_ioctl.
One thing you can check is the sensor mode and output resolution. Take onboard camera as an example, it supports 3 sensor modes:
We are actually using native sensor resolution as our final output resolution, in our code. How can we zero-in on what is triggering an unsolicited conversion, and how can we get rid of it ?
I do not understand why would high fps trigger it ? Any idea what calls could be going on in v4l2convert.so ? or is there a quick and easy way for owner of this library to offer a “no operation” version of those APIs ? That could help in determining if those calls are unnecessary or not ?
Furthermore in order to imitate the simplified command line - of
Cam1 ---> renderer
I tried argus_camera app - which seems much more optimal. From the profiler result it seem that app is NOT making use of v4l2convert.so ? How can I find out what is leading upto call to v4l2convert.so in my code and eliminate it ?
Hi,
The pipeline is built in libargus.so. Your application should run the same capture pipeline.
We have source code of argus_camera at tegra_multimedia_api\argus\apps\camera. You may compare it with your application.
You may also download v4l-utils.git - media (V4L2, DVB and IR) applications and libraries , put debug prints, and rebuild v4l2convert.so to check where the loading is from. The branch is stable-1.0
I studied / copied tegra_multimedia_api/argus/apps/camera. Compared it with I have implemented. I am pretty much using same argus APIs which are being used in argus_camera app. While I am still working on this particular performance issue, I would like to ask a pipeline question related to mapping of buffers. I am suspecting that it could be leading to few unwanted conversion api calls.
My pipeline looks like -
Cam1 -->--> GPU --> CPU (frame processing, and data annotation )
More specifically -
Cam1 -->-->map-and-enqueue--> GPU --> CPU (frame processing, and data annotation ) --->dequeue-and-unmap
I am wondering whether action of mapping an input frame, for storage in a queue could be causing any conversion of format ?
THis is what I do -
Acquire a frame, map it for further access on GPU and CPU.
UniqueObj<Frame> frame(iFrameConsumer->acquireFrame());
IFrame *iFrame = interface_cast<IFrame>(frame);
if (!iFrame)
break;
// Get the Frame's Image.
Image *image = iFrame->getImage();
IArgusCaptureMetadata *iArgusCaptureMetadata = interface_cast<IArgusCaptureMetadata>(frame);
if (!iArgusCaptureMetadata)
ORIGINATE_ERROR("Failed to get IArgusCaptureMetadata interface.");
CaptureMetadata *metadata = iArgusCaptureMetadata->getMetadata();
ICaptureMetadata *iMetadata = interface_cast<ICaptureMetadata>(metadata);
if (!iMetadata)
ORIGINATE_ERROR("Failed to get ICaptureMetadata interface.");
EGLStream::NV::IImageNativeBuffer *iImageNativeBuffer
= interface_cast<EGLStream::NV::IImageNativeBuffer>(image);
TEST_ERROR_RETURN(!iImageNativeBuffer, "Failed to create an IImageNativeBuffer");
// aaFrameBuffer is a data struct which encapsulates, few pointers and a NvBuffer.
aaFrameBuffer *framedata = new aaFrameBuffer;
framedata->framefd = iImageNativeBuffer->createNvBuffer(ARGUSSIZE {m_pCamInfo->liveParams.inputVideoInfo.width, m_pCamInfo->liveParams.inputVideoInfo.height},
NvBufferColorFormat_YUV420, NvBufferLayout_Pitch, &status);
NvBufferGetParams(framedata->framefd, &(framedata->nvBufParams));
framedata->fsizeY = framedata->nvBufParams.offset[1] + (framedata->nvBufParams.offset[2]-framedata->nvBufParams.offset[1])*2;
framedata->fsizeU = framedata->nvBufParams.pitch[1] * framedata->nvBufParams.height[1] ;
framedata->fsizeV = framedata->nvBufParams.pitch[2] * framedata->nvBufParams.height[2];
m_pCamInfo->procInfo.pitchWidthY = framedata->nvBufParams.pitch[0];
m_pCamInfo->procInfo.pitchWidthU = framedata->nvBufParams.pitch[1];
m_pCamInfo->procInfo.pitchWidthV = framedata->nvBufParams.pitch[2];
AACAM_CAPTURE_PRINT("4 Starting frame caputre %d \n",m_currentFrame);
framedata->dataY = (char *)mmap(NULL, framedata->fsizeY, PROT_READ | PROT_WRITE, MAP_SHARED, framedata->framefd, framedata->nvBufParams.offset[0]);
framedata->dataU = (char *)mmap(NULL, framedata->fsizeU, PROT_READ | PROT_WRITE, MAP_SHARED, framedata->framefd, framedata->nvBufParams.offset[1]);
framedata->dataV = (char *)mmap(NULL, framedata->fsizeV, PROT_READ | PROT_WRITE, MAP_SHARED, framedata->framefd, framedata->nvBufParams.offset[2]);
Frame from step 1, is put in Q.
GPU reads from the Q and processes it
CPU reads output of GPU and older frame from Q
After a delay of about 8 frames - a given frame is popped from Q , unmapped and destructed.
My question is whether mapping a frame fd could cause any conversion ?
Hi,
I am not sure about mmap() and v4l2convert.so. But for NvBuffer, you should use APIs defined in nvbuf_utils.h
/**
* This method must be used for hw memory cache sync for the CPU.
* @param[in] dmabuf_fd DMABUF FD of buffer.
* @param[in] plane video frame plane.
* @param[in] pVirtAddr Virtual Addres pointer of the mem mapped plane.
*
* @returns 0 for success, -1 for failure.
*/
int NvBufferMemSyncForCpu (int dmabuf_fd, unsigned int plane, void **pVirtAddr);
/**
* This method must be used for hw memory cache sync for device.
* @param[in] dmabuf_fd DMABUF FD of buffer.
* @param[in] plane video frame plane.
* @param[in] pVirtAddr Virtual Addres pointer of the mem mapped plane.
*
* @returns 0 for success, -1 for failure.
*/
int NvBufferMemSyncForDevice (int dmabuf_fd, unsigned int plane, void **pVirtAddr);
/**
* This method must be used for getting mem mapped virtual Address of the plane.
* @param[in] dmabuf_fd DMABUF FD of buffer.
* @param[in] plane video frame plane.
* @param[in] memflag NvBuffer memory flag.
* @param[in] pVirtAddr Virtual Addres pointer of the mem mapped plane.
*
* @returns 0 for success, -1 for failure.
*/
int NvBufferMemMap (int dmabuf_fd, unsigned int plane, NvBufferMemFlags memflag, void **pVirtAddr);
/**
* This method must be used to Unmap the mapped virtual Address of the plane.
* @param[in] dmabuf_fd DMABUF FD of buffer.
* @param[in] plane video frame plane.
* @param[in] pVirtAddr mem mapped Virtual Addres pointer of the plane.
*
* @returns 0 for success, -1 for failure.
*/
int NvBufferMemUnMap (int dmabuf_fd, unsigned int plane, void **pVirtAddr);
sensor -> ioctl(VIDIOC_DQBUF) -> captured frames in raw format -> VI/ISP -> frames in I420/NV12 format
Is there a way to avoid VIDIOC_DQBUF call ? I see this happening with argus_camera app too. Does each arrow here mean an read and write transaction to external memory (DRAM) ? That would seriously increase BW and degrade performance.
Is there a way for data to directly go to VI/ISP ? and not be routed via external memory ?
Hi,
Argus frameworks is optimal and no extra memory copy. All operations are required for sensor frame capture. It has to take reasonable CPU/memory bandwidth.