Strange results for sysbench - TK1 vs TX1


Anyboy can explain this? EMC, CPU, GPU frequencies set to maximum, but:

sysbench --test=memory --memory-block-size=1M --memory-total-size=3G run

TK1: 6448 MB/s (DDR3L)
TX1: 3607 MB/s (LPDDR4)

Why TX1 results is lowest than TK1?

One of the first things I’d consider is that there may be unusual interactions related to the 32-bit program and 32-bit libraries it links to while the kernel is 64-bit.

Profiling in any detail would be difficult, but for a TX1 “strace -c” reveals a LOT of time spent in futex and clone (85% futex, 10% clone, combined about 95% of total system call time), while actual read and lseek (the work really required for the benchmark) showed as only about the remaining 5%. This seems like a lot of overhead with not much time truly spent doing “memory things”.

On a TK1 futex time drops down to 57%, clone does not even register as significant (in both cases there is only a single call); the bulk of TK1 time next to futex is in mmap2 (34%). Despite clone being significant overhead for a TX1 and insignificant for a TK1, everything points to futex calls being the bottleneck.

On the idea that maybe the futex times had some competing process in the way I reniced on the TX1 to -1 and then to -2, but had no difference in results. I don’t know what the cause is, but it seems the benchmark for a TX1 is really measuring kernel inefficiencies and isn’t measuring actual memory performance.

Thanks for reply. It makes sense! So… We need real 64-bit OS! :)

Another interesting thing, here is benchmark results, but for 4 threads (–num-threads=4)

Jetson TK1 : 11826 MB/s
Jetson TX1 : 13827 MB/s