Extremely slow memcpy to mmap shared region

Please provide complete information as applicable to your setup.

**• Jetson Xavier **
• Deepstream version 6.3
• Jetpack version 4.6.2
**• TensorRT Version 8.5.1.7 **
• Issue Type bugs

Hi,
I’m trying to transfer a frame extracted from the deepstream pipeline to shared memory (mmap) via memcpy, for this I do the following:

// Initialize the shared memory with mmap
char name[25];
sprintf(name, "/shared_mem%zu", 0);
total_size = 1500000

int fd = shm_open(name, O_RDWR | O_CREAT, 0644);

ftruncate(fd, total_size);

void* shared_mem = mmap(NULL, total_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);


// Extract the raw frame from the pipeline
NvBufSurfaceMap (surface, -1, -1, NVBUF_MAP_READ);
NvBufSurfaceSyncForCpu (surface, 0, 0);
NvBufSurfaceParams params = surface->surfaceList[i];
  
guint height = surface->surfaceList[0].height;
guint width = surface->surfaceList[0].width;
guint pitch = surface->surfaceList[0].pitch;
  
// Put the extracted image into an OpenCV datastructure
void* addr = surface->surfaceList[0].mappedAddr.addr[0];
uchar * arr = BGR_mat.isContinuous() ? BGR_mat.data : BGR_mat.clone().data;
uint length = BGR_mat.total() * BGR_mat.channels();
  
cv::Mat BGR_mat = cv::Mat(height, width, CV_8UC1, addr, pitch);

// This memcpy is extremly slow (2s for 1Mb so around 0.5Mb/s)
memcpy(shared_mem + offset, (const void*) arr, length);

I also already put the Xavier into performance mode, so it shouldn’t be causing the problem,

Any ideas on what could be causing this?

What do you mean by " Extremely slow memcpy to mmap shared region" ? How slow?

I takes 2000ms to copy 2Mb

It doesn’t seem like the problem comes from mmap, using memcpy to copy 1Mb between any two regions seems to also be that slow.

Is the content correct after memcpy?

Yes it is

Executing memcpy like this also takes 1.5 seconds for 1Mb

void* addr1 = malloc(1000000);
void* addr2 = malloc(1000000);

// this line takes 1.5 seconds
memcpy(addr2, addr1, 1000000);

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

It does not take so much time in our board with your code.

time ./a.out
real 0m0.007s
user 0m0.000s
sys 0m0.004s

#include <stdlib.h>
#include <string.h>int main(int argc, char ** argv)
{
void* addr1 = malloc(1000000);
void* addr2 = malloc(1000000); // this line takes 1.5 seconds
memcpy(addr2, addr1, 1000000);
}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.