Threads Per Block Issue

aclub · September 7, 2010, 12:43pm

Hi,

I am new to CUDA programming. I have a confusion regarding threads per block issue. I am using GTX-285 which supports 512 threads per block.

During the testing, I checked my program with 1024 threads per block and it worked without any errors! Can anyone tell me why it is so? Also I am getting much better performance when I use 1024 threads per block.
Thank You

Regards,
M. Awais

avidday · September 7, 2010, 12:54pm

Failure to find errors usually stems from a failure to look for them. If you check for errors straight after kernel launch with a cudaGetLastError() call, I am willing to bet that your kernels never launch with an invalid execution argument error. This is the reason for your observed “much better performance”: kernels not actually running at all, but failing to launch, which is much faster than when they actually run. If you are seeing good looking results in memory you are copying back from the device, it is probably left over from a 512 or less thread block run which left them in memory. Device memory isn’t cleared or touched from context to context.

avidday · September 7, 2010, 12:54pm

Failure to find errors usually stems from a failure to look for them. If you check for errors straight after kernel launch with a cudaGetLastError() call, I am willing to bet that your kernels never launch with an invalid execution argument error. This is the reason for your observed “much better performance”: kernels not actually running at all, but failing to launch, which is much faster than when they actually run. If you are seeing good looking results in memory you are copying back from the device, it is probably left over from a 512 or less thread block run which left them in memory. Device memory isn’t cleared or touched from context to context.

Topic		Replies	Views
blocks vs threads and bad CUDA performance CUDA Programming and Performance	3	3615	January 23, 2015
Run a million threads or blocks on a single kernel function, and still works. It supposed to be 512 at maximum, isn't it? CUDA Programming and Performance	4	1390	January 6, 2017
Maximum Number of Threads CUDA Programming and Performance	5	2473	June 4, 2010
An illegal memory access was encountered CUDA Programming and Performance cuda	2	940	December 1, 2022
Unexpected behavior with varying number of threads per block CUDA Programming and Performance	2	3471	November 5, 2008
New findings needed to be verified: Maximum thread block is not 1024 in K20 CUDA Programming and Performance	4	822	November 17, 2014
Max threads/block CUDA Programming and Performance	10	22348	March 7, 2011
Two questions about too many threads in a block CUDA Programming and Performance	5	2373	October 26, 2011
threads and blocks CUDA Programming and Performance	3	1424	May 7, 2012
Number of threads affecting answer; this should not happen a VERY strange error.. CUDA Programming and Performance	8	2594	July 17, 2009

Threads Per Block Issue

Related topics