You are right. There is a bug in that opensource.
I have fixed it.
Then, below table is results.
Xavier
Orin
espresso@zerofly-desktop:/mnt/nvidia/benchmark$ sudo jetson_clocks espresso@zerofly-desktop:/mnt/nvidia/benchmark$ ./copybenchmark copying 1953 MB time = 0.080000 1250.000028 millions of uints/sec [memcpy] time = 0.032000 3124.999852 millions of uints/sec espresso@zerofly-desktop:/mnt/nvidia/benchmark
espresso@espresso-desktop:/mnt/nvidia/benchmark/copybenchmark$ sudo jetson_clocks espresso@espresso-desktop:/mnt/nvidia/benchmark/copybenchmark$ ./copybenchmark copying 1953 MB time = 0.163294 612.392363 millions of uints/sec [memcpy] time = 0.024657 4055.643451 millions of uints/sec espresso@espresso-desktop:/mnt/nvidia/benchmark/copybenchmark$
Exactly, Orin’s NEON accelerator is slow than Xavier. Right?
Hi,
Internally we have checked performance of memcpy() and it s expected that Orin has better performance. We don’t check vst1q_u32() and don’t have much experience of using it. It seems to be an experimental function per vst1q_u32 in core::arch::arm - Rust
If calling memcpy() is fine in your use-case, we would suggest call this function instead of calling vst1q_u32()