2d grid and block performance

ronanrmo · November 26, 2008, 10:12am

Hi,

I’m trying to make some performance comparison between several CUDA implementations of the same problem, that is an weighted-jacobi iterative solver.
But the problem is that when I use the 2d grids and blocks I have a very poor performance, about 3,5 times slower than the 1d grid and block implementation.
I’ve already checked the coalesced memory accesses with the cudaProfiler and I dont have any uncoalesced. I’m using the textures for the input vector, but I dont think that the memcpy needed for the 2d version is the bottleneck since I’ve removed this part of the code for testing.

I am basically calling the kernel 1000 times for each version, and each kernel call makes one w-jacobi step. It is a very simple implementation. And for the 1d version I am using 1d textures (tex1Dfetch for getting the values) and for the 2D (tex2D).

Something that I noted in the cudaProfiler is the increase of the number of instructions that is 4 times greater in the 2d version. But I’ve checked the code for both and it seems that I have only about 30% more instructions for the 2D. Does the 2d grid hide some instructions?

Thank you advance,
Ronan

Sorry for my english. :">

Topic		Replies	Views
Significance of Linear Grid vs. 2D Grid CUDA Programming and Performance	1	1731	July 3, 2009
Block dim discussion 1D vs 2D CUDA Programming and Performance	8	8348	August 14, 2007
Suboptimal performance of CUDA port CUDA Programming and Performance	3	1714	April 7, 2012
Optimizing 2-D CUDA code CUDA Programming and Performance	0	6611	July 9, 2009
Why 2 GPUs is slower than 1 GPU CUDA Programming and Performance cuda , kernel	6	447	December 4, 2023
Grid size & performance CUDA Programming and Performance	1	819	September 27, 2016
CUDA slower than CPU Help me please... CUDA Programming and Performance	0	2832	February 4, 2010
CUDA motivation for multi-dimensional kernel execution CUDA Programming and Performance	6	4166	December 8, 2013
CUDA slower than CPU Help me please... CUDA Programming and Performance	2	5709	February 8, 2010
Grid Block launch configuration CUDA Programming and Performance	2	985	November 18, 2011

2d grid and block performance

Related topics