I want to deploy super resolution DNNs on NVIDIA Jetson AGX Orin 32GB.
The super-resolution codes are written in pytorch, and there are two data types (float32 and int16).
So I want to ask following 5 questions.
Q1) If I port these codes without quantization, will they work on cuda core?
Q2) What should I do to make these code work on tensor core?
Q3) What should I do to make these code work on DLA?
Q4) If I convert my FP32 code to TF32, how much will the performance improve on the Jetson?
Q5) If I convert my INT16 code to FP16, how much will the performance improve on the Jetson?
1. If you infer the model with GPU, it will use cuda core.
2. TensorRT will use tensor core when the layer can run on tensor core.
3. Use TensorRT or cuDLA API. 4.5 Please check the data format supported by TensorRT below. For example, INT16 is not supported.
It’s expected that the performance will increase when the quantization is applied.
But the ratio is model/layer dependent, please test it directly to get an idea.
These support matrices provide a look into the supported platforms, features, and hardware capabilities of the NVIDIA TensorRT 8.6.0 Early Access (EA) APIs, parsers, and layers.
April 11, 2023, 3:11am
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.
April 26, 2023, 2:46pm
Check out the DLA github page for samples and resources: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.
We have a FAQ page that addresses some common questions that we see developers run into: