Understanding deviceQuery

agr_saurav1 · June 28, 2014, 6:33am

The deviceQuery gives me the following output:

Device 0: “GeForce GTX 650”
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1058 MHz (1.06 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

I don’t understand it completely. I tried looking up in the internet and ended up getting more confused.

Here are few questions I would like to know:

How many blocks and threads I can launch at a time? It says Maximum number of threads per block: 1024, however it also says Max dimension size of a thread block (x,y,z): (1024, 1024, 64). So, I can have a block of dimension (1024,1024,64) and would all the the threads run simultaneously?
How does number of multi-processors and number of cores/mp affect my programming?
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535). What does this mean? How do I calculate the maximum blocks and threads/block I can deploy?

Regards

njuffa · June 28, 2014, 7:28am

The maximum number of threads per block is 1024, so you must choose the dimensions of the thread block such that xyz <= 1024, and each of the individual limits on x, y, z is observed. So you could have a (1024,1,1) block, or a (1,1024,1) block, or a (32,32,1) block or a (4,4,64) block, etc. Often times, better performance is achieved by using smaller thread blocks containing only 128 or 256 threads. See the Best Practices Guide for guidance.

For the grid you just need to stay within the stated limits for each dimension. The maximum number of threads would be the maximum number of blocks in a grid, which is 2147483647 * 65535 * 65535, times 1024 which is the maximum number of threads in each block. I have yet to encounter a real-life application that fully exploits the maximum grid dimensions.

In general, the shape and size of grids and thread blocks is a function of how you map data to threads. For example, you may operate on 2D data in such a way that each 16x16 thread block handles a 16x16 sub-matrix, and the grid is sized to allow for however many blocks are needed to tile the entire matrix. To stay with the example, if the entire matrix is 2048x1024 elements, you would need a grid of 128x64 blocks.

agr_saurav1 · June 28, 2014, 9:14am

This clears a lot of doubts.
However, two things remain:

What is the significance of ( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores? How can I use this information?
What do we mean by Max dimension size of a thread block (x,y,z): (1024, 1024, 64)? particularly, in contrast to the fact that max no. of threads in a block is 1024.

Topic		Replies	Views
device organization CUDA Programming and Performance	1	4219	April 6, 2008
Maximum number of threads on thread block CUDA Programming and Performance	12	77192	September 21, 2023
Question regarding maximum amount of blocks CUDA Programming and Performance	2	912	January 28, 2011
CUDA - thread block confusion concept clearity sought CUDA Programming and Performance	6	3107	November 10, 2011
deviceQuery CUDA Programming and Performance	4	2272	June 14, 2007
Question about grid/block/thread sizes CUDA Programming and Performance	3	12375	November 13, 2012
Block dimensions in CUDA CUDA Programming and Performance	0	916	November 4, 2011
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	21812	December 13, 2018
Determine threads per block as product of two variables. CUDA Programming and Performance	2	7683	September 11, 2010
Question about device query information, 48 cuda cores * 32, versus 1024 limitation. Maximum perform CUDA Programming and Performance	3	11561	June 16, 2011

Understanding deviceQuery

Related topics