Does this code cause bank conflicts?

cuda_new_bird · May 17, 2024, 2:51am

#include <stdio.h>

const int N = 256;
__global__ void write_double(int* input) {
    __shared__ double data[N];

    if(threadIdx.x != 0) return;

    printf("%p %p\n", &data[0], &data[1]);
    int tid = threadIdx.x;
    data[tid] = input[tid];
}

int test1() {
    int* input;
    printf("sizeof double is %d\n", sizeof(double));

    int n = N;
    cudaMalloc(&input, n * sizeof(int));
    write_double<<<1, N>>>(input);
    cudaFree(input);
    return 0;
}

As far as I know, bank size is 4 byte, and sizeof(double) is 8, so thread 0 access address [0~7] in bank 0 and 1, and thread 16 access address [128~135] which is also in bank 0 and bank 1, is that the fact?

I think there should be bank conflicts but in my profiling, I can’t see it. I wonder why?

Thank you !

Greg · August 27, 2024, 4:05pm

The understanding of bank width and banks accessed is correct.

The Load Store Unit (LSU) will break load/store instructions of >32-bit per thread into multiple wavefronts so it will show an increase in wavefronts; however, these are not reported a bank conflicts.

571987523 · September 2, 2024, 9:54am

No bank conflicts here because when each thread access 8 byte, the 32 threads in a warp will be splited into two phases to do shared memory loading,
firstly, the T0~T15 thread will access the 32 bank data firstly,
then, the T16~T31 will access the next 32 bank data.

Similarly, when each thread access 16 byte, like float4, the 32 threads will be splited into four phase to do shared memory loading, and each phase will process a quater of a warp.

cuda_new_bird · September 6, 2024, 7:43am

Thank you. I heard of this, is there any official documentation talking about this?

cuda_new_bird · September 6, 2024, 7:43am

Is there any documentation mentioned this?

Topic		Replies	Views
Shared memory bank conflict CUDA Programming and Performance	1	286	May 19, 2024
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2630	March 31, 2010
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	893	December 29, 2011
do not understand bank conflicts please help CUDA Programming and Performance	7	2689	December 22, 2012
bank conflict in cuda's parallel prefix scan GPU-Accelerated Libraries	1	1889	February 12, 2016
How to understand the bank conflict of shared_mem CUDA Programming and Performance	12	10041	January 16, 2025
Bank Conflicts CUDA Programming and Performance	2	1962	December 6, 2009
float4 Shared memory doesn't yield bank conflict according to nvprof when it should CUDA Programming and Performance	4	1925	January 13, 2024
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3468	August 20, 2009
CUDA Reduction CUDA Programming and Performance	2	1750	March 1, 2009

Does this code cause bank conflicts?

Related topics