using syncthreads still at n00b status

KChou · November 29, 2010, 10:08pm

Hi all,

the use of syncthreads has got me confused.

I’m trying to do something really simple:

load an original array into a shared memory array
write the contents of the shared memory array into a global memory array
print out the contents of the global memory array. I should get global memory array = original array

everything works fine if i don’t use __syncthreads();

but if I use use __syncthreads(); after loading original array into global memory array then my output is wrong.

why is this?

my code:

resultIndex = w+h*dest->width;

extern __shared__ int sArray[];

int* sKernel = (int*)&sArray;

//put a small matrix into shared memory of each block

if(threadIdx.y < kernel->width && threadIdx.x < kernel->width)

	sKernel[threadIdx.x+threadIdx.y*matrixWidth] = kernel->matrixGPUelement(threadIdx.x+threadIdx.y*matrixWidth);

__syncthreads(); //this screws up my result

output[resultIndex]=sKernel[threadIdx.x+threadIdx.y*matrixWidth];

seibert · November 30, 2010, 2:42pm

What are the variables w, h, and dest->width set to?

KChou · November 30, 2010, 11:30pm

h=blockIdx.y * blockDim.y + threadIdx.y;

w=blockIdx.x * blockDim.x + threadIdx.x;

dest->width = 64 for my testing

and kernel->width = matrixWidth = 3

seibert · November 30, 2010, 11:50pm

Hmm, two comments:

__syncthreads() appears to be unnecessary for the code you post because each thread writes to and reads from a separate shared memory location. (In fact, shared memory is completely unnecessary in this code fragment, but that might be because you deleted some lines to simplify the example for the forum.) __syncthreads() is needed if you are going to have two different threads read and write to the same shared memory location at different points in your code. The execution barrier ensures that all the writes finish before different threads read the values. If each thread reads the same location it wrote to, the barrier is not required.
Adding the __syncthreads() should do nothing to this code, so the fact that you get a wrong answer is still concerning. Do you check for return codes from all the CUDA function calls?

KChou · December 1, 2010, 12:19am

thanks for the fast reply.

I worked on the code some more and found out __syncthreads() wasn’t the source of the problem. I still don’t know what’s causing the problem, however, and my kernel wasn’t returning any errors.

Here’s a little more of my code, hopefully it will help pinpoint the problem.

let me define some variables…

int* sSub = (int*)&sArray;				//shared memory array for sub-sample

	int* sKernel = (int*)&sArray[256*4];	//shared memory array for kernel

        resultIndex = w+h*dest->width;	//for the grid

        sampleIndex = w+h*image->width;	//for the grid

	kernelIndex = threadIdx.x+threadIdx.y*kernel->width;	//for every block

doing this works, it prints out the kernel in every block:

//place kernel in shared memory

	//one kernel in every block, all kernels fit in block

	if(threadIdx.y < kernel->width && threadIdx.x < kernel->width)

		sKernel[kernelIndex] = kernel->matrixGPUelement(kernelIndex);

	

	output[resultIndex]= sKernel[kernelIndex];

doing this works too, it prints out the matrix i’m working with:

//place a submatrix of image into shared memory

	sSub[sampleIndex] = image->matrixGPUelement(sampleIndex);

	output[resultIndex]= sSub[sampleIndex];

but this doesn’t work, which should print out kernels in every block:

//place kernel in shared memory

	//one kernel in every block, all kernels fit in block

	if(threadIdx.y < kernel->width && threadIdx.x < kernel->width)

		sKernel[kernelIndex] = kernel->matrixGPUelement(kernelIndex);

		

	//place a submatrix of image into shared memory

	sSub[sampleIndex] = image->matrixGPUelement(sampleIndex);//I simply added this line of code 

	

	output[resultIndex]= sKernel[kernelIndex];

why does adding that line there mess up my result?

thanks in advance!

Topic		Replies	Views
Shared Memory Problems - __syncthreads() doesn't work? CUDA Programming and Performance	5	2675	December 29, 2011
syncthreads() issue CUDA Programming and Performance	3	1735	March 29, 2009
__syncthreads() and global memory CUDA Programming and Performance	1	2497	December 1, 2008
CUDA BUG? Shared memory contents differ across threads __syncthreads() not working??? CUDA Programming and Performance	1	1895	September 10, 2009
__syncthreads question CUDA Programming and Performance	9	2128	September 30, 2009
__syncthreads screwes calculation CUDA Programming and Performance	2	3420	November 22, 2007
__syncthreads thread syncronization CUDA Programming and Performance	7	18769	October 27, 2009
__syncthreads() not syncing the threads, although not in if statement CUDA Programming and Performance	1	654	April 26, 2016
shared memory and __syncthreads() one writer, n readers CUDA Programming and Performance	5	3034	August 25, 2008
different results why? CUDA Programming and Performance	3	1497	April 23, 2010

using syncthreads still at n00b status

Related topics