maximum thread numbers

wuninsu · October 3, 2011, 1:51pm

If I using GTX 580, then what is the maximum thread that i can made?

i.e kernel<<<dim_grid, dim_block>>>

what is the maximum value of dim_grid * dim_block that I can get the best performance?

pasoleatis · October 3, 2011, 4:23pm

Run the program NVIDIA_GPU_Computing_SDK/C/bin/linux/release/./deviceQuery. The way you choose the number will depend on the details of each problem.

wuninsu · October 4, 2011, 5:34am

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: “GeForce GTX 480”

CUDA Driver Version / Runtime Version 4.0 / 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 1536 MBytes (1610285056 bytes)

(15) Multiprocessors x (32) CUDA Cores/MP: 480 CUDA Cores

GPU Clock Speed: 1.45 GHz

Memory Clock rate: 1900.00 Mhz

Memory Bus Width: 384-bit

L2 Cache Size: 786432 bytes

Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)

Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and execution: Yes with 1 copy engine(s)

Run time limit on kernels: No

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: Yes

Alignment requirement for Surfaces: Yes

Device has ECC support enabled: No

Device is using TCC driver mode: No

Device supports Unified Addressing (UVA): Yes

Device PCI Bus ID / PCI location ID: 133 / 0

Compute Mode:

 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce GTX 480

[deviceQuery] test results…

PASSED

this is the result of device query.

But it gives me the # of cores and maximum number of threads per block. But I don’t know the # of blocks that hold by one core.

Snowball_Two · October 4, 2011, 2:08pm

Each core can handle 8 Blocks Maximum.

Also each Core can Handle max. 1536 Threads.

Actually you only need 480 threads to get all cores working at the same time. (480 Cuda Cores). But to Hide memory accesses ect. pp you should start many many many more.

tera · October 4, 2011, 5:03pm

x, y, and z of dim_grid up to 65535 each. x and y of dim_block up to 1024 width the additional constraint that total block size dim_block.xdim_block.ydim_block.z < 1024. Check appendix F of the Programming Guide for this.

There are no maximum values after which performance would drop, only minimum values.

pasoleatis · October 4, 2011, 6:52pm

Hello,

From the wikipedia page CUDA - Wikipedia I can only infer that there can be 65535 x 65535 x 65535 with 1024 threads per block. That would be the maximum size, but I realized now that your question is different. The GPU parallelism behaves different from a other parallel implementations. While in OPENMP or MPI you would expect to have a max number of threads or processes above which there is no benefit in CUDA you get better performances by having smaller tasks in many threads, because the latency (time spent to access memory) is hidden. Each block is executed on a MP and all the threads in the block share the so called “SHARED MEMORY”. The programmer has direct access to shared memory which is very fast to access.

The answer to you question depends on the details of the problem.

Topic		Replies	Views
Maximum number of threads in a GPU CUDA Programming and Performance cuda	5	6085	December 29, 2022
What parameters to choose - threads, blocks, warps CUDA Programming and Performance	3	337	October 14, 2022
Understanding deviceQuery CUDA Programming and Performance	2	4092	June 28, 2014
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	20692	December 13, 2018
Run 2 Multiprocessors from one global function CUDA Programming and Performance	3	541	January 18, 2018
Thread Number Limitation CUDA Programming and Performance	3	3889	December 22, 2008
Maximum number of threads on thread block CUDA Programming and Performance	12	73503	September 21, 2023
Question about grid/block/thread sizes CUDA Programming and Performance	3	12269	November 13, 2012
Maximum thread number running at the same time for GT650M CUDA Programming and Performance	1	6422	December 13, 2012
Basics re: memory & performance - total newbie to CUDA CUDA Programming and Performance	5	627	January 23, 2018

maximum thread numbers

Related topics