result of bank conflict testing is weird! bank conflict

gepolv · May 2, 2011, 6:44am

Hi, all,

I recently run some experiments to evaluate the performances with and without bank conflict. However, I found the run time is weird, please take a look at my codes:

const int gridsize=1;

const int blocksize=256;

const int N=gridsize*blocksize;

__global__ void no_bk( int iter)

{

    int idx = blockIdx.x*blockDim.x+ threadIdx.x;

    __shared__ int3 test[N];

    int3   temp=test[idx];		     

}

There is not supposed to have bank conflicts for above codes. Because stride is 3, as shown in CUDA Programming Guide 3.0 (Fig G-2).

__global__ void bk(int iter)

{

    int idx = blockIdx.x*blockDim.x+ threadIdx.x;

    __shared__ int2 test[N];		     

    int2  temp=test[idx];

}

There is supposed to have bank conflicts for above codes. Because stride is 2, as shown in CUDA Programming Guide 3.0 (Fig G-2).

However, the output of runtime is :

no bk: 0.117569 seconds

with bk: 0.000013 seconds

How can it be possible? Or something wrong happens in my codes?

Thanks

My GPU is computer capacity 1.1

This is how I compute time:

dim3 block(blocksize);

dim3 grid(gridsize);

	

cutCreateTimer(&timer);

cutStartTimer(timer);

    		

 no_bk<<<grid, block>>>(internal_iters);

cudaThreadSynchronize();

cutStopTimer(timer);

printf("no bk: %f seconds\n", cutGetTimerValue(timer) / 1000);

LSChien · May 2, 2011, 6:54am

no code is generated by nvcc because you don’t output any data.

You can use cuobjdump to check this.

avidday · May 2, 2011, 6:58am

Both of those kernel codes will compile to empty functions because of compiler optimization. I don’t know what time difference it is you are measuring, but it certainly doesn’t have anything to do with the code of either kernel.

gepolv · May 2, 2011, 3:42pm

Thank you both. But the problem, if I add an output to these kernel functions, then there will be a memory coalescing violation which will affect the accuracy of measuring bank conflict. Do you have an example that exclusively measure the performance of bank conflicts without influence of other issues (like memory coalescing)?

Thanks a lot

Deryk.

gepolv · May 2, 2011, 3:51pm

The following is my code:

__global__ void no_bk( int iter, int3 * output)

{

int idx = blockIdx.x*blockDim.x+ threadIdx.x;

__shared__ int3 test1[N];

for (int i = 0; i < iter; ++i)

	 output[idx]=test1[idx];

}

__global__ void bk(int iter, int2* output)

{

int idx = blockIdx.x*blockDim.x+ threadIdx.x;

	__shared__ int2 test2[N];		     

for (int i = 0; i < iter; ++i)

output[idx]=test2[idx];

}

Result is:

Starting GPU test v1 …

no bk: 0.020932 seconds

with bk: 0.001793 seconds

Still weird: performance with bank conflict is better than that without bank conflict. here N is 256.

Thanks.

Topic		Replies	Views
No performance gain after 0 bank conflicts CUDA Programming and Performance	1	2496	July 14, 2008
No Performance gain after 0 bank conflicts CUDA Programming and Performance	1	2170	July 13, 2008
Problem with bank conflict. Something wrong with my experiment?Confused! CUDA Programming and Performance	4	1317	February 26, 2009
Why there is random bank conflicts? CUDA-MEMCHECK cuda	2	1256	September 19, 2023
No performance gain after 0 bank conflicts CUDA Programming and Performance	1	2366	July 14, 2008
Effect of Bank Conflicts Poor performance or wrong result CUDA Programming and Performance	0	924	January 11, 2010
Are bank conflicts random or deterministic? CUDA Programming and Performance	7	117	November 28, 2024
Very strange share memory bank conflicts CUDA-MEMCHECK cuda	1	983	October 15, 2021
Very strange share memory bank conflicts CUDA Programming and Performance cuda	4	576	November 2, 2021
Matrix Multiplication and Bank conflicts code included CUDA Programming and Performance	3	1434	April 16, 2012

result of bank conflict testing is weird! bank conflict

Related topics