Accord to this Tensor core of Jetson AGX Orin - Jetson & Embedded Systems / Jetson AGX Orin - NVIDIA Developer Forums issue, I known the SM arch of Orin is GA10x, but if we calculate the FP16 FMA computing capability base on the GA10x, it should be 1.3Ghz(64GB dev kit) * 64 * 128 * 2 = 21.3TFLOPs, and the sparse INT8 computing capability should be 21.3 * 2 * 2 = 85TFLOPs. However, it said AGX Orin 64GB has 170TOPs sparse INT8 computing capability.
After I search more information, I found that the SM arch of Orin is GA10b, and according to this The tensor core performance detail of Jetson AGX Orin 32GB - Jetson & Embedded Systems / Jetson AGX Orin - NVIDIA Developer Forums issue, I known that the dense FP16 FMA operations per tensor core of GA10b is 256(while GA10x is 128), so 85 * 2 = 170TOPs, it matchs the description in Orin’s spec doc.
My question is where I can get the spec doc of GA10b, I’m wandering the SM arch of GA10b, and I want to know the design of CUDA core, I want to know the whole FP16 computing capability of the 2048 CUDA cores.
Thanks.
Hi,
You can find the spec in the below link directly:
Thanks.
Sorry, I didn’t find any datasheets or whitepapers about GA10b in this website. BTW, I also want to know the design detail of DLA, could you share any docs?
Hi,
GA10b is the Orin series (sm-87).
What kind of design details do you want to find?
Below is the technical brief for the Orin series and you can find DLA info on page 7.
Thanks
Hi,
Yes, I’ve already read all the design docs from the Orin website and some Ampere arch white paper, but never found any detailed information about GA10b.
I’d like to know the SM arch of GA10b, like this image:
Or at least the specific description of the operate ability of the GA10b’s SM, both CUDA core and Tensor Core.
As for DLA, I’d like to know all the operations it supported and how to distinguish the engine is run on the GPU or DLA, and in which data format. As far as I know is how to run some CV model on DLA in sparse INT8 format, but I’m wandering whether DLA could apply for dense FP16 compute.
Thanks
Hi,
We need to check with the internal team if any public SM info can be shared.
An engine run on DLA or GPU is controlled by the user.
When building an engine with TensorRT API, it needs the placement info as well (default=GPU).
The support matrix of DLA of layers can be found in the below link:
Thanks.