Column-Major Ordering

zenosparadox · June 17, 2010, 4:00pm

Hi everyone,

Does anyone know how column-ordering affects performance of a kernel? For example, is there any performance penalty for using say threads (0,0), (1,0), (2,0), …, which using all the threads from a ROW inside of a block, rather than using threads (0,0), (0,1), (0,2), …, which is a COLUMN of threads inside of a thread block.

Same question applies to blocks. I have a program, and it seems like, unless I have things backwards in my head or backwards in my code, that using blocks of threads to access each ROW of my resulting matrix gives much better results than using COL ordering. I have verified there is a performance decrease, but am not exactly sure why.

tera · June 17, 2010, 4:17pm

The important thing is to make sure that memory access gets coalesced, which usually means that threadIdx.x should be the trailing index of the most accessed array.

Other than that, order is mostly irrelevant. Particularly, the order in which blocks execute should not matter (apart from more esoteric effects like partition camping, but even that would unlikely be influenced by block order).

Topic		Replies	Views
Orientation of Threads in a Block. CUDA Programming and Performance	4	1278	September 30, 2009
Thread Block Shape Versus Performance Choosing proper Thread Block Shape CUDA Programming and Performance	6	6973	May 23, 2007
Impact of Grid and Block Dimension on performance CUDA Programming and Performance	1	741	November 1, 2015
efficiency of block/thread ratios CUDA Programming and Performance	2	3818	April 18, 2007
CUDA perormances CUDA Programming and Performance	10	7129	January 22, 2008
What is the reason that the dimension of thread organization also has an impact on computational efficiency? CUDA Programming and Performance	0	342	October 10, 2019
Why am I getting better performance with per column vs per row for matrix addition? CUDA Programming and Performance	1	3167	March 26, 2017
row priority or col priority CUDA Programming and Performance	4	1097	May 3, 2011
Thread Block Size what difference does it make? CUDA Programming and Performance	6	5400	June 3, 2008
Significance of Linear Grid vs. 2D Grid CUDA Programming and Performance	1	1730	July 3, 2009

Column-Major Ordering

Related topics