So, I was able to train in Pytorch some RNN speech-to-text workloads on the TX2 GPU. I can also do inference on both the TX2 GPU and the TX2 ARM A57 CPU.
The question is: when these RNN workloads are instructed to run inference on the TX2 A57 CPU, do you know if SIMD vectorization occurs by default? Or do we need to do something special to enable SIMD?
On Jetson, it’s recommended to convert your model into TensorRT engine instead.
TensorRT has optimized the DNN inference based on each layer’s property.