Please provide the following info (tick the boxes after creating this topic): Software Version
[v] DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other
Target Operating System
[v] Linux
QNX
other
Hardware Platform
[v] DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other
SDK Manager Version
[v] 1.9.10816
other
Host Machine Version
[v] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other
According to this table, the performance of Jetson Drive Orin’s iGPU is 167 TOPS based on INT8 and 5.2 TOPS based on FP32. Here 167 TOPS means the performance of TensorCore ? And can we consider 5.2 TOPS as the performance of CUDA Core? If that is correct, can you explain how you calculated and derived 167 TOPS (64 TensorCores) and 5.2 TOPS (2048 CUDA cores)?
@SivaRamaKrishnaNV Thank you!
As I understand it, the 256 used to calculate the OPS for the FP32 is 128 cuda cores per sm multiplied by the “sparse matrix factor” of 2, right?
But I don’t understand how 8196 for INT8 OPS came about. Can you please explain?
Dear @soohyung.zhang,
As I understand it, the 256 used to calculate the OPS for the FP32 is 128 cuda cores per sm multiplied by the “sparse matrix factor” of 2, right?
Correct. 2 FMA operations per cycle
I don’t understand how 8196 for INT8 OPS came about. Can you please explain?
@SivaRamaKrishnaNV Thanks for your kind reply.
Finally, I would like to ask one more fundamental question. Can you tell me why the IMMA per cycle is 32 times the FMA? Are there any effects of accelerators such as tensor core? I don’t see tensor core in your formula.
Dear @soohyung.zhang,
I have mentioned Giga ops per SM in formula.
Each SM has 4 tensor cores. So 8192 gigaops is effect of(combined contribution)tensor cores only.