Launching more than 1024 threads per block in Xavier (Solved)

siquike · September 18, 2018, 7:36pm

I noticed that the Jetson Xavier has 512 cores, while the TX2 only had 256. Does this mean I can launch more than 1024 threads per block?

dusty_nv · September 18, 2018, 10:01pm

Hi siquike, the maximum number of threads per block on the Volta SM is 1024, here is the deviceQuery output:

$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Xavier"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    7.2
  Total amount of global memory:                 15820 MBytes (16588668928 bytes)
  ( 8) Multiprocessors, ( 64) CUDA Cores/MP:     512 CUDA Cores
  GPU Max Clock rate:                            1500 MHz (1.50 GHz)
  Memory Clock rate:                             1500 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

There are a total of 2048 active threads per SM, so multiple blocks can be executing concurrently.

Topic		Replies	Views
How to decide ThreadPerBlock in customised cuda kernel? Jetson AGX Xavier cuda , kernel	2	459	March 2, 2022
Question about device query information, 48 cuda cores * 32, versus 1024 limitation. Maximum perform CUDA Programming and Performance	3	11510	June 16, 2011
deviceQuery CUDA Programming and Performance	4	2098	June 14, 2007
Maximum number of threads How to find maximum number of threads your Card can support CUDA Programming and Performance	16	10359	July 7, 2009
What is the maximum number of threads per block? CUDA Programming and Performance	4	21262	April 8, 2010
Cuda Cores Cuda Cores - run threads bloocks, kernels etc. CUDA Programming and Performance	5	1805	February 22, 2011
threads how many threads can simultaneously execute? CUDA Programming and Performance	1	1994	February 27, 2009
Max number of thhreads per block and max number of blocks Jetson Xavier NX cuda , kernel	4	2203	September 11, 2023
Maximum possible number of threads (Total) CUDA Programming and Performance	1	1029	December 28, 2009
Thread Scheduling / Limit maximum threads per block in each dimension vs Maximum thread on a SM CUDA Programming and Performance	3	1775	June 21, 2012

Launching more than 1024 threads per block in Xavier (Solved)

Related topics