Jetson Xavier NX slower than Jetson TX2 at pytorch inferences

I have the Xavier NX running on the 20W 6 core mode with jetson_clock.sh running. I also tried the 20W 2 core mode wich slightly made it faster.

Using this script which benchmarks the inference time of a batch size of 1 on cuda with shufflenet_v2_x0_5 , the Xavier NX takes around 30 ms per inference on the 20W 6 core mode and around 25 ms on the 20W 2 core mode. The tx2 has an inference time of around 24 ms. Is it normal for the Xavier NX to be slower than the tx2 in these conditions?

When would the Xavier NX faster than the tx2? For fp32 do the two have similar speeds?

Even though the two have similar speeds in pytorch, would I see any time difference running in tensorrt?

Does it matter that my Xavier NX is running off of an sd card rather than a nvme ssd?

At a batch size of 100, the Xavier NX is about 2.5 times faster but I do not plan on using such a large batch size and even then, the performance is still underwhelming.
using_prewritten_benchmark.py (1.1 KB)

Hi,

Could you run tegrastats to see if the GPU resources reach 99% first?

Thanks.

Hi.
yes I can confirm that the GPU resources is at 99%. I tried some other tests and at f32 with tensorrt, the Xavier is about two times faster and with f16, it is around 3 times faster. However, I still think the xavier should be faster?

Hi,

Xavier should be faster than TX2.

But if the model is memory-bound, this might depend on the memory/storage performance.
In general, we recommend using TensorRT to get an optimized performance first.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.