Launch Parameters for Persistent Threads on GTX 570

BasedGod505 · October 1, 2015, 11:40pm

I am attempting to implement persistent threads as described in: https://mediatech.aalto.fi/~samuli/publications/aila2009hpg_paper.pdf
for my CUDA application running on a GTX570.

Major revision number:         2
Minor revision number:         0
Name:                          GeForce GTX 570
Total global memory:           1310272 kb
Total global memory:           1279 mb
Total shared memory per block: 49152
Total registers per block:     32768
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     1024
Maximum dimension 0 of block:  1024
Maximum dimension 1 of block:  1024
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   65535
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   65535
Clock rate:                    1540000
Total constant memory:         65536
Texture alignment:             512
Concurrent copy and execution: Yes
Number of multiprocessors:     15
Kernel execution timeout:      Yes
Compute capability: 2.0.

–ptxas-options=-v gives the following output for my kernel:

ptxas info : Used 46 registers, 64 bytes smem, 48 bytes cmem[0], 24 bytes cmem[16]

This is the result of using NVidia’s CUDA GPU Occupancy Calculator:

I’m confused as how to interpret this information in regards to how many blocks I should be launching. Since I have 15 SMs and the number of active thread blocks per multi-processor is 5 then I should be launching 5 * 15 blocks with 128 threads per block? Any clarification would be much appreciated.

little_jimmy · October 2, 2015, 4:50am

persistent threads generally persist
hence, i have found the easiest test to be the debugger - the debugger can show the number of thread blocks seated, and where

Topic		Replies	Views
maximum thread numbers CUDA Programming and Performance	5	12072	October 4, 2011
Maximum number of threads in a GPU CUDA Programming and Performance cuda	5	6555	December 29, 2022
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	21044	December 13, 2018
What are the limits on block size? CUDA Programming and Performance	1	3845	July 22, 2011
What parameters to choose - threads, blocks, warps CUDA Programming and Performance	3	337	October 14, 2022
Understanding deviceQuery CUDA Programming and Performance	2	4116	June 28, 2014
Registers per SM GTX 460 CUDA Programming and Performance	7	1912	April 17, 2011
a simple question about the resident blocks per multiprocessor CUDA Programming and Performance	6	3829	August 23, 2017
Kernel Execution problem CUDA Programming and Performance	5	819	March 6, 2013
device organization CUDA Programming and Performance	1	4163	April 6, 2008

Launch Parameters for Persistent Threads on GTX 570

Related topics