Problem when using more than 64 threads per block

Bogdan_Prisacari · May 4, 2010, 10:18am

Hello,

I am writing a CUDA application that uses extensively shared memory and often transfers data between global GPU memory and CPU memory and also between global GPU memory and shared GPU memory. The results I get are ok while the number of threads per block I am using is inferior to 64. After that, I get “strange” results that to me suggest something went wrong with the memory (to give you an example, I have some value that should decrease by 2 at each execution; the result I am noticing after increasing the number of threads past 64 is that value decreasing by 3 or 6; it’s still somewhat regular behavior, but wrong behavior none the less).

I am running CUDA in Visual Studio 2008 on Windows 7 64 bits, the standard Debug x64 configuration from the template project. My GPU is a GeForce 8800 GT.
The different threads run independent code that reads however some common shared memory area.
Even with 256 threads I don’t surpass the 16Kb of shared memory.

In Emulated Debug mode, everything runs ok.

Does anyone have an idea of what I could be doing wrong?

Thanks

Bogdan

tera · May 4, 2010, 11:36am

Quite likely you are missing to synchronize the threads with __syncthreads() somewhere.

Bogdan_Prisacari · May 4, 2010, 12:17pm

Indeed, I do not use syncthreads, but why should I ? (given the fact that I do not write the same memory area concurrently, only read it).

Bogdan_Prisacari · May 4, 2010, 1:11pm

SOLVED

So, in theory, I wasn’t writing concurrently the same memory area, but in practice I was. I tried synchronizing the threads almost everywhere and the problem became obvious. Thank you.

Topic		Replies	Views
Troubles with synchronization of threads (I think) CUDA Programming and Performance	0	2325	July 29, 2009
Maximum Number of Threads CUDA Programming and Performance	5	2398	June 4, 2010
__syncthreads() + shared memory issue CUDA Programming and Performance	7	5588	August 26, 2008
syncthreads problem I guess this is a syncthreads problem CUDA Programming and Performance	9	5130	October 12, 2008
Optimisation Strategies when running out of shared memory CUDA Programming and Performance	1	555	March 12, 2011
Synchronizing Blocks CUDA Programming and Performance	3	2432	January 10, 2018
blocks vs threads and bad CUDA performance CUDA Programming and Performance	3	3553	January 23, 2015
Limitations of a CUDA kernel reached? CUDA Programming and Performance	3	4326	March 7, 2011
Newbie: More threads == much slower? :( CUDA Programming and Performance	4	2080	July 25, 2008
CUDA principals - summary CUDA Programming and Performance	0	335	September 1, 2018

Problem when using more than 64 threads per block

Related topics