syncthread and loops

Hi,

I suspect the following code has some syncronization bugs that causes the result to be faulty. Any assistance is more then welcomed.

Thanks in advance.

// smInputLoops is a shared memory int

	for ( int i = 0; i < smInputLoops; i++ )

	{

		__syncthreads();

								// find where the input starts for the current iteration for all threads.

		if ( 0 == threadIdx.x )

		{

			smInputPos = pDeviceInputPerVAGC[ smInputIndex ];

			smInputIndex++;

		}

		__syncthreads();

								// load the shared information

		smSample1[ threadIdx.x ] = pDeviceInput[ smInputPos + iTimeIndex ];

		smSample2[ threadIdx.x ] = pDeviceInput[ smInputPos + constKernelParams[ 7 ] + iTimeIndex ];

		__syncthreads();

								// Accumolate the results into shared memory

		smResults[ threadIdx.x ] += smSample1[ threadIdx.x ] * smSample1[ threadIdx.x ]; 

	}

	__syncthreads();

				// Write accumulated result from shared memory to global memory

	pDeviceOutput1[ smOutputPos + threadIdx.x ] = smResults[ threadIdx.x ];

Well, you don’t need the first __syncthreads(), but that’s definitely not something that may cause a problem.
The only problem I see is not initializing smResults.

Hi,

smResults is initialized to zero before this loop :)

I added the first __sync because I think i saw in enum mode that smInputPos has changed before all the threads

in the block are done, which caused to different threads working on different inputs :)

Any furthre suggestions please?

eyal

Could you post a little more code above and below the “problem section”?
Also, what Block size are you using? If BlockDim.y or BlockDim.z is greater than 1, it is possible that several threads within the same blovk read pDeviceInputPerVAGC with a different index, in which case the result is undefined.