SM Unit Structure for Jetson Xavier Family Devices

Hi,

I’d like to know SM Unit Structure for Jetson Xavier Family Devices.

Our understanding is like below;

Question:
Is this understanding correct?

Best regards,

hello mat-h,

those number of total CUDA cores were correct for Jetson AGX Xavier (512), and Xavier NX (384).
you should also review System-on-Module Data Sheet to have details. thanks

Hi,

Thank you very much for your kindness.
I really appreciate your help.

From your guide link both documents, we found same description like below:

  • It is comprised of Texture Processing Clusters (TPC), with
    each TPC containing two SM units
  • Each SM is partitioned into four separate processing blocks (referred to as SMPs)

So, our understanding is like below;

  • Jetson AGX Xavier has 2xSM
    each SM has 4xSMPs
    each SMP has 64 CUDA cores
    → total 512 CUDA cores

  • Jetson Xavier NX has 2xSM
    each SM has 4xSMPs
    each SMP has 48 CUDA cores
    → total 384 CUDA cores

Question:
Is this understanding correct?

Best regards,

hello mat-h,

that’s not exactly correct.
it should also depends-on the data precision. there’re different cores for those variable types, (i.e. FP32, FP64…etc),
please also refer to CUDA Toolkit Documentation for reference,
thanks

Hi,

Thank you very much for your kindness.
I really appreciate your help.

I’d like to focus to clarify “CUDA core” on Jetson AGX Xavier.
Our understanding is like below;

  • Jetson AGX Xavier has eight Volta Streaming Multiprocessors (SMs)
    each SM has 4xSMPs
    each SMP has 16 CUDA cores
    ( It seems to be same as FP32 cores in SMP.)
    → total 512 CUDA cores

Question:
We understand “SMP has various unit including 16 FP32 Cores, 8 FP64 Cores, 16 INT32 Cores” from below pdf.
https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
From FP32 performance view, the number of CUDA cores represent the number of FP32 cores.
Is this understanding correct?

[Detail]:
from below link:

  • The Jetson AGX Xavier integrated Volta GPU, shown in figure 3, provides 512 CUDA cores
  • The GPU includes eight Volta Streaming Multiprocessors (SMs) with 64 CUDA cores
  • Each SM consists of 4 separate processing blocks referred to as SMPs

from your guide link below:
https://docs.nvidia.com/cuda/volta-tuning-guide/index.html#volta-tuning

  • The GV100 SM provides 64 FP32 cores and 32 FP64 cores.
    The GV100 SM additionally includes 64 INT32 cores and 8 mixed-precision Tensor Cores.

This below link pdf also said : SMP has 6 FP32 Core ( It seems to be same as FP32 cores in SMP.)

  • The GV100 SM is partitioned into four processing blocks,
    each with 16 FP32 Cores, 8 FP64 Cores, 16 INT32 Cores,
    two of the new mixed-precision Tensor Cores for deep learning matrix arithmetic,
    a new L0 instruction cache, one warp scheduler, one dispatch unit, and a 64 KB Register File.

Best regards,

you may also have interest in this page, Jetson Benchmarks | NVIDIA Developer.

Hi,

Thank you very much for your kindness.
I really appreciate your help.

I’d like to know “CUDA core” definition.
Then I found related link like below;

image

So, I reached the conclusion like below;

  • CUDA core includes “FP32 Core & 16 INT32 Core”.

Then, our understanding is like below;

  • Jetson AGX Xavier has 8xSMs
    each SM has 4xSMPs
    each SMP has 16 CUDA cores
    → total 512 CUDA cores

  • Jetson Xavier NX has 6xSMs
    each SM has 4xSMPs
    each SMP has 16 CUDA cores
    → total 384 CUDA cores

I’d like to continue to investigate about “CUDA core definition” include your guide links.

If there is any official document about “CUDA core definition”, please tell us.

Best regards,

this is an official documentation, Programming Guide :: CUDA Toolkit Documentation

hello mat-h,

there’s CUDA example, deviceQuery to dump the details of your Jetson platform.
for example, /usr/local/cuda-10.2/samples/1_Utilities/deviceQuery/

Detected 1 CUDA Capable device(s)

Device 0: "Xavier"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    7.2
  Total amount of global memory:                 31921 MBytes (33471238144 bytes)
  ( 8) Multiprocessors, ( 64) CUDA Cores/MP:     512 CUDA Cores
  GPU Max Clock rate:                            1377 MHz (1.38 GHz)
  Memory Clock rate:                             1377 Mhz
  Memory Bus Width:                              256-bit
...

Hi,

Thank you very much for your kindness.
I really appreciate your help.

I can also find “Jetson Xavier NX has ( 6) Multiprocessors” by deviceQuery command like below;

Device 0: “Xavier”
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.2
Total amount of global memory: 7764 MBytes (8140656640 bytes)
( 6) Multiprocessors, ( 64) CUDA Cores/MP: 384 CUDA Cores

Thank you !