I have the Xavier NX running on the 20W 6 core mode with jetson_clock.sh running. I also tried the 20W 2 core mode wich slightly made it faster.
Using this script which benchmarks the inference time of a batch size of 1 on cuda with shufflenet_v2_x0_5 , the Xavier NX takes around 30 ms per inference on the 20W 6 core mode and around 25 ms on the 20W 2 core mode. The tx2 has an inference time of around 24 ms. Is it normal for the Xavier NX to be slower than the tx2 in these conditions?
When would the Xavier NX faster than the tx2? For fp32 do the two have similar speeds?
Even though the two have similar speeds in pytorch, would I see any time difference running in tensorrt?
Does it matter that my Xavier NX is running off of an sd card rather than a nvme ssd?
At a batch size of 100, the Xavier NX is about 2.5 times faster but I do not plan on using such a large batch size and even then, the performance is still underwhelming.
using_prewritten_benchmark.py (1.1 KB)