work group size different work group lead to different performance

Biaowang · November 22, 2011, 11:40pm

hey, Gays:
I am writing a kernel in which I load data from global memory to shared memory, and then perform computation. However, when I change the size of work group , e.g. shrink to half, or quarter, from 256 to 128 or 64, and made the thread load the data in loop, then the performance different a lot.I have employed the visual profiler to profile the kernel , just found that the global request differs to some extent, but not proportional to the size of work group. any ideas to explain this?
thanks in advance!

Biaowang · November 23, 2011, 4:11pm

just a note that these slide is very interesting http://nvidia.fullviewmedia.com/gtc2010/0922-a5-2238.html and may help to the problem

Topic		Replies	Views
Observation about performance change with change in grid size CUDA Programming and Performance	0	1448	May 19, 2009
CUDA perormances CUDA Programming and Performance	10	7133	January 22, 2008
bigger computation time for less operations CUDA Programming and Performance	0	463	December 16, 2010
shared memory problems size of shared memory allocated affects execution time? CUDA Programming and Performance	2	743	June 20, 2011
null workgroup size bug CUDA Programming and Performance	1	1172	January 26, 2010
Just give me an advice. write global CUDA Programming and Performance	1	1039	November 13, 2009
Short kernel calls better? CUDA Programming and Performance	2	2636	July 7, 2008
kernel performance and number of threads CUDA Programming and Performance	2	6599	November 22, 2007
Block Size.. CUDA Programming and Performance	2	1783	July 11, 2008
Performance in different thread-block schemes CUDA Programming and Performance	5	2349	September 19, 2008

work group size different work group lead to different performance

Related topics