Unexpected CUDA processing time dependency on thread count

alexander.marinsek · April 17, 2021, 8:06am

When calling a kernel function, the number of threads per block should ideally be a multiple of the warp size. This yields more efficient use of resources and lower processing times. However, there seems to be another factor that periodically decreases the processing time. Shown below, the processing time gets offset every 32 threads, while an additional speed-up takes place at every multiple of 11 threads per block. What are the reasons that could lie behind this behaviour?

The GPU in question is the GeForce GT 730, running the kernel function attached at the bottom of this post. For timing purposes, it gets invocated in a loop using:

kernel_generate_image[(16,16),(1,i+1)](px, 32)

where px = np.zeros([1024,1024])

@cuda.jit
def kernel_generate_image(image, T):

    # Calculate the thread's absolute position within the grid
    x = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
    y = cuda.threadIdx.y + cuda.blockIdx.y * cuda.blockDim.y

    # Set stride equal to the number of threads we have available in either direction
    stride_x = cuda.gridDim.x * cuda.blockDim.x
    stride_y = cuda.gridDim.y * cuda.blockDim.y

    for i in range(x, image.shape[0], stride_x):
        for j in range(y, image.shape[1], stride_y):
            image[i, j] = (sin(i*2*pi/T+1)*sin(j*2*pi/T+1)*0.25)

Topic		Replies	Views
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7078	January 30, 2008
Kernel execution time variable execution time depending on grid CUDA Programming and Performance	1	4822	March 30, 2010
How do thread numbers affect processing time? CUDA Programming and Performance	2	767	January 26, 2016
Result exmplain CUDA Programming and Performance	0	495	February 24, 2021
kernel performance and number of threads CUDA Programming and Performance	2	6640	November 22, 2007
Bad performance problems and discussion CUDA Programming and Performance	1	603	May 17, 2016
Elementwise kernel number of thread block CUDA Programming and Performance	5	180	July 31, 2024
ideal number of tread per block CUDA Programming and Performance	10	3068	March 25, 2010
Basic Cuda Confusion - help CUDA Programming and Performance	9	2001	February 11, 2013
How does number of blocks of threads effect gpu performance CUDA Programming and Performance	1	536	June 21, 2011

Unexpected CUDA processing time dependency on thread count

Related topics