using vectors in GPU kernel

e.ping · March 22, 2007, 10:38pm

Dear Experts,

I just dived in to the CUDA environment on Linux, read the tutorial and understood the matrix multiplication example. In that example, the resultant matrix C is multiples of 16 due to the block size and simplicity. In my project I need to process an image and obtain an image which does not have a size of multiples of 16. For example, the width of the resultant image is 71. Therefore I can only fit four 16 bliocks and there will be 7 pixels left to process. So my quesiton is, how should I define the block size? Can I make it different than 16? I guess 16 is like a magic number to utilize the GPU in a maximum way. If I have to use 16x16 block size, how should I process the remaining pixels?
Thanks in advance for the reply.

prkipfer · March 23, 2007, 10:05am

16 is not really a magic number. Performance depends on sensible layout of the blocks to get least divergent warps and non-colliding bank access. See this forum about these topics and check the occupancy calculator.

For “odd” sizes, leaving excess threads idle probably has the least impact on performance.

Peter

AlexTutubalin · March 23, 2007, 12:34pm

Just allocate more memory (multiply of 16) and fill unused items with zeroes.

Zero values will not affect the result matrix.

e.ping · March 24, 2007, 6:46am

I actually thought that, I was wondering if there were any other solutions to that. I guess I will do it like that.

Thanks for the reply.

Topic		Replies	Views
Help: blocksize of launch failure? CUDA Programming and Performance	2	2695	April 29, 2009
blocksize not a multiple of num_elements CUDA Programming and Performance	3	5686	October 22, 2007
What's the best way to calculate the number of blocks for any input size CUDA Programming and Performance	4	1447	September 19, 2023
block of 16x15( !=16) CUDA Programming and Performance	1	1510	February 26, 2009
Thread size in a block should be multiple of warp size? CUDA Programming and Performance	4	6159	January 17, 2013
question about setting block_size in matrixMul CUDA Programming and Performance	1	1132	September 5, 2008
Thread Block Size CUDA Programming and Performance	1	912	September 17, 2009
Strange CUDA Image Processing behavior CUDA Programming and Performance	1	2425	November 9, 2009
What if number of threads is not divisible by block size CUDA Programming and Performance	3	5168	September 25, 2009
How to determine the Block Size CUDA Programming and Performance	1	5964	September 4, 2009

using vectors in GPU kernel

Related topics