Undeterministic kernel configuration

sicb0161 · November 8, 2007, 1:31pm

Hi,

I tried to optimize my code (sparse matrix vector multiplication) by choosing the optimal blocksize: I ran my code with different blockdimensions according to the warp occupancy calculator (compile code with -cubin option, read reg and shared mem usage and run the code for blocksizes with 100% warp occupancy) and chose the blockdimension with lowest runtime. My problem is that it does change when I take different matrix elements.
Is the runtime(blockdim,…,matrixelements) variable of the matrix elements , if so how can I choose the best blockdim when data is unknown?

Does someone else encountered the same problem ?

thx for help in advance,

cem

asadafag · November 9, 2007, 2:20am

What’s your algorithm? Load balancing may be much more important than occupancy in some cases, and it depends on both block size and matrix shape.

sicb0161 · November 9, 2007, 9:09am

The matrix dimensions are fixed when I test different block sizes. The only thing which is changing are the matrix elements initialized every time with random float numbers.

Actually, it is a very simple algorithm, as the sparse matrix S_ij has only non-zero elements on the diagonal S_ij = 0 if i!= j. So, the only thing it does is to multiply global_A_ij * global_S_ii, where each thread is responsible to compute one multiplication.

I dont really get what you mean by load balancing. How can I do that in CUDA- maybe let one thread do more work than just compute one multiplication?

thx a lot in advance

asadafag · November 9, 2007, 11:19am

Your algorithm shouldn’t have a load balancing issue, and performance shouldn’t depend on the elements.
Did you time the first run? That’s the most likely cause I can think of.

Topic		Replies	Views
dynamic load balancing within blocks CUDA Programming and Performance	0	1960	July 20, 2009
Grids and Threads question CUDA Programming and Performance	2	4421	August 7, 2007
Using shared memory where a variable number of threads shares some data. CUDA Programming and Performance	3	4310	May 14, 2011
shared memory array can size be the same as blocksize by default? CUDA Programming and Performance	8	3357	March 6, 2011
understanding the trade-off between block size and occupancy CUDA Programming and Performance	1	14151	March 29, 2010
matrix multiplication with shared memory (randomly sized) shared memory matrix multiplication random CUDA Programming and Performance	0	1734	May 29, 2009
CUDA Pro Tip: Occupancy API Simplifies Launch Configuration Technical Blog	12	687	February 21, 2017
Hide latency CUDA Programming and Performance	3	513	June 9, 2023
Occupancy wierdness.... Is the calculator wrong? CUDA Programming and Performance	5	5901	July 25, 2007
How to determine the Block Size CUDA Programming and Performance	1	5902	September 4, 2009

Undeterministic kernel configuration

Related topics