Memcpy is quite slow

I use the following code to measure performance of memcpy of 1920x1080 bytes and it can take up to 5ms to finish. This is quite slow. How can I speed it up?

include
include
include
typedef std::chrono::high_resolution_clock Clock;
int main(int argc, char *argv) {
char *memA;
char *memB;
char *ptrA;
char *ptrB;
for (int i = 0; i < 10; i++) {
memA = (char *)malloc(6220800);
memB = (char *)malloc(6220800);
auto before_ts = Clock::now();
ptrA = (char *) memA;
ptrB = (char *) memB;
for (int j = 0; j < 1080; j++) {
std::memcpy(ptrB, ptrA, 1920);
ptrA = ptrA + 1920;
ptrB = ptrB + 1920;
}
std::cout <<“after: " << i <<”\n";
auto after_ts = Clock::now();
std::cout <<“i: " << i << " Time: "
<< std::chrono::duration_caststd::chrono::milliseconds(
after_ts - before_ts)
.count()
<< " ms\n”;
std::free(memA);
std::free(memB);
}

return 0;
}

after: 0
i: 0 Time: 1 ms
after: 1
i: 1 Time: 5 ms
after: 2
i: 2 Time: 5 ms
after: 3
i: 3 Time: 5 ms
after: 4
i: 4 Time: 5 ms
after: 5
i: 5 Time: 5 ms
after: 6
i: 6 Time: 5 ms
after: 7
i: 7 Time: 5 ms
after: 8
i: 8 Time: 5 ms
after: 9
i: 9 Time: 5 ms

Boost the memory clocks to try.

sudo su
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

After following your suggestions, I ran memory copy test again and the results are almost the same as before the changes:

./test
after: 0
i: 0 Time: 2 ms
after: 1
i: 1 Time: 4 ms
after: 2
i: 2 Time: 4 ms
after: 3
i: 3 Time: 4 ms
after: 4
i: 4 Time: 4 ms
after: 5
i: 5 Time: 4 ms
after: 6
i: 6 Time: 4 ms
after: 7
i: 7 Time: 4 ms
after: 8
i: 8 Time: 4 ms
after: 9
i: 9 Time: 4 ms

./test
after: 0
i: 0 Time: 5 ms
after: 1
i: 1 Time: 5 ms
after: 2
i: 2 Time: 5 ms
after: 3
i: 3 Time: 5 ms
after: 4
i: 4 Time: 5 ms
after: 5
i: 5 Time: 5 ms
after: 6
i: 6 Time: 5 ms
after: 7
i: 7 Time: 5 ms
after: 8
i: 8 Time: 5 ms
after: 9
i: 9 Time: 5 ms

How about also have system run in performance mode.

sudo nvpmode -m 0
sudo jetson_clocks&

My system is already in the performance mode. Can you compile my test app and profile the performance on your side so we can compare the results with each other

Thanks,

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Please provide the binary would be better.

Thanks