Shared memory bank conflicts?


I have the following code:


//... load data to shared memory arrays.

__shared__ int smnumMoveSamples[ BLOCK_THREAD_SIZE];

__shared__ int smNb[ BLOCK_THREAD_SIZE];

__shared__ float smw2[BLOCK_THREAD_SIZE];

__shared__ int smInputTracePos[ BLOCK_THREAD_SIZE];

for( int iTraceIndex = 0; iTraceIndex < BLOCK_THREAD_SIZE; iTraceIndex ++ )


	fTraceOutPhase += smnumMoveSamples[ iTraceIndex ] + smInputTracePos[ iTraceIndex ] * threadIdx.x;

	fTraceOutPhase2 += smw2[iTraceIndex] * blockIdx.x;

	fTraceOutStack += smNb[ iTraceIndex ] - threadIdx.x; fFold += blockIdx.x;


This is a test code I tried (the real code is a bit different with texture access and stuff…) however I see this code

amounts to ~70-80% of the time - i.e. I waste most of the kernel time on this !!! I always thought it was the texture/gmem access…

This for loop is for every thread, since I need to go over all the data per thread.

I guess I’m doing the worst ever shared memory access - but can this penalty be caused by shared memory bank conflicts? anyway

of making it better?