I want to deploy super resolution DNNs on NVIDIA Jetson AGX Orin 32GB.
The super-resolution codes are written in pytorch, and there are two data types (float32 and int16).
So I want to ask following 5 questions.
Q1) If I port these codes without quantization, will they work on cuda core?
Q2) What should I do to make these code work on tensor core?
Q3) What should I do to make these code work on DLA?
Q4) If I convert my FP32 code to TF32, how much will the performance improve on the Jetson?
Q5) If I convert my INT16 code to FP16, how much will the performance improve on the Jetson?
1. If you infer the model with GPU, it will use cuda core.
2. TensorRT will use tensor core when the layer can run on tensor core.
3. Use TensorRT or cuDLA API.
4.5 Please check the data format supported by TensorRT below. For example, INT16 is not supported.
It’s expected that the performance will increase when the quantization is applied.
But the ratio is model/layer dependent, please test it directly to get an idea.