— Xavier AGX, DS 6.0.1, JetPack 4.6.1 —
Hello everyone,
I’m experiencing high CPU usage and low framerates when processing images from multiple cameras using DeepStream 6.0.1.
The pipelines structure involves the following:
nvvideoconvert nvbuf-memory-type=4 (surface array memory) ! video/x-raw(memory:NVMM), width=1920, height=1080, framerate=35/1, format=BGRx ! appsink sync=false drop-buffers=true, max-buffers=1 emit-signals=1
In the new-sample callback function the nvmm-buffer is mapped to a NvBufsurface like this:
static GstFlowReturn on_new_image(GstElement* sink, void* user_data)
{
if(nullptr != user_data && NULL != user_data )
{
CImageProcessingCamera* receivingCamera = reinterpret_cast<CImageProcessingCamera*>(user_data);
g_signal_emit_by_name(sink, "pull-sample", &sample, NULL);
if(NULL != sample)
{
GstBuffer* buffer = gst_sample_get_buffer(sample);
GstMapInfo info;
if(gst_buffer_map(buffer, &info, GST_MAP_READ))
{
NvBufSurface *surface = (NvBufSurface *)info.data;
NvBufSurfaceMap(surface, 0, 0, NVBUF_MAP_READ);
NvBufSurfaceSyncForDevice(surface, 0, 0);
std::unique_lock<std::shared_mutex> lockOnCameraFrame(receivingCamera->mGpuFrame.mMutex);
cudaMemcpy(receivingCamera->mGpuFrame.mGpuImage.data, surface->surfaceList[0].mappedAddr.addr[0], receivingCamera->mImageMemorySize, cudaMemcpyKind::cudaMemcpyDefault);
lockOnCameraFrame.unlock();
NvBufSurfaceUnMap(surface, 0, 0);
gst_buffer_unmap(buffer, &info);
gst_sample_unref(sample);
}
...
}
....
}
...
return GST_FLOW_OK;
When I run the pipeline without the callback, the CPU load is very low. The high CPU load seems to occur as soon as we grab the sample in the callback, and the other contributing factor seems to be cudamemcpy.
Cudamemcpy copies the image data to a shared memory space between the CPU and GPU. Then, each image is immediatly processed further. I’d prefer to continue copying toward the shared memory.
So far, running this setup with four cameras results in most cores being almost fully utilized.
Is there something I’m missing? Is there a way to reduce the CPU usage? Any help would be appreciated. :)