weird bank conflict when matrix transpose

ys9617 · February 1, 2020, 3:46pm

i couldnt understand result nsight compute report make.

this is my code

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

const int BDIMX = 32;
const int BDIMY = 16;

__global__ void matrixTranspose1(int *result, int *m) {
	__shared__ int smem[BDIMY][BDIMX];

	size_t gidx = threadIdx.y * blockDim.x + threadIdx.x;

	size_t irow = gidx / blockDim.y;
	size_t icol = gidx % blockDim.y;

	smem[threadIdx.y][threadIdx.x] = m[gidx];

	__syncthreads();

	result[gidx] = smem[icol][irow];
}

int main() {
	int *mat = new int[BDIMX*BDIMY];
	int *h_result = new int[BDIMX*BDIMY];

	int *d_mat, *d_result;
	int mat_byte = BDIMX * BDIMY * sizeof(int);
	
	cudaMalloc(&d_mat, mat_byte);
	cudaMalloc(&d_result, mat_byte);

	cudaMemcpy(d_mat, mat, mat_byte, cudaMemcpyHostToDevice);

	dim3 block(BDIMX, BDIMY);

	matrixTranspose1<<<1, block>>>(d_result, d_mat);

	cudaMemcpy(h_result, d_result, mat_byte, cudaMemcpyDeviceToHost);

	cudaFree(d_mat);
	cudaFree(d_result);

	delete[] mat;
	delete[] h_result;

	return 0;
}

in my code, expected bank conflict is 256 but nsight compute report showing 240 bank conflict when shared memory load.

please help me to understand why nsight compute report show 240 conflict

https://github.com/ys9617/bin/blob/master/shared%20memory%20bank%20conflict.PNG

Greg · February 10, 2020, 4:46am

Each instruction has 16 requests. The first 15 requests have a bank conflict. The last request does not have any more conflicts.

Topic		Replies	Views
Ncu detects bank conflicts in matrix transposition after padding Nsight Compute cuda	5	1439	January 30, 2023
Why there is random bank conflicts? CUDA-MEMCHECK cuda	2	1255	September 19, 2023
The increase of the shared memory size leads to the bankconflict (from 9 KB shared memory) Nsight Compute	5	600	July 14, 2023
The question of the example of "3.2.2.3 Shared Memory in Matrix Multiplication(C=A*A(T)" i CUDA Programming and Performance	0	1927	September 17, 2009
Shared memory bank conflict CUDA Programming and Performance	4	495	July 30, 2025
How to understand the bank conflict of shared_mem CUDA Programming and Performance	16	14206	November 19, 2025
Very strange share memory bank conflicts CUDA-MEMCHECK cuda	1	981	October 15, 2021
shared memory without bank conflict slower than that with bank conflict CUDA Programming and Performance	2	941	November 28, 2019
Very strange share memory bank conflicts CUDA Programming and Performance cuda	4	575	November 2, 2021
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6797	February 8, 2009

weird bank conflict when matrix transpose

Related topics