Automatic SIMD when deep learning workloads on TX2 ARM CPU?


So, I was able to train in Pytorch some RNN speech-to-text workloads on the TX2 GPU. I can also do inference on both the TX2 GPU and the TX2 ARM A57 CPU.
The question is: when these RNN workloads are instructed to run inference on the TX2 A57 CPU, do you know if SIMD vectorization occurs by default? Or do we need to do something special to enable SIMD?

Thank you,


This should depend on the pyTorch implementation.
For our recommended pyTorch package, the inference is using GPU acceleration.

You can find more implementation detail in their GitHub:

On Jetson, it’s recommended to convert your model into TensorRT engine instead.
TensorRT has optimized the DNN inference based on each layer’s property.