Analyzing bank conflicts with Nsight compute

yannick.ongena · August 14, 2020, 10:08am

I’m following a book around CUDA and they show following example to illustrate the bank conflicts.
The book uses visual profiler but because I have a newer GPU, I need to use Nsight compute.

This is the kernel:

__global__ void matrix_transpose_shared(int* input, int* output) {

	__shared__ int sharedMemory[BLOCK_SIZE][BLOCK_SIZE];

	// global index	
	int indexX = threadIdx.x + blockIdx.x * blockDim.x;
	int indexY = threadIdx.y + blockIdx.y * blockDim.y;

	// transposed global memory index
	int tindexX = threadIdx.x + blockIdx.y * blockDim.x;
	int tindexY = threadIdx.y + blockIdx.x * blockDim.y;

	// local index
	int localIndexX = threadIdx.x;
	int localIndexY = threadIdx.y;

	int index = indexY * N + indexX;
	int transposedIndex = tindexY * N + tindexX;

	// reading from global memory in coalesed manner and performing tanspose in shared memory
	sharedMemory[localIndexX][localIndexY] = input[index];

	__syncthreads();

	// writing into global memory in coalesed fashion via transposed data in shared memory
	output[transposedIndex] = sharedMemory[localIndexY][localIndexX];
}

When I profile my code in Nsight Compute, it doesn’t even give me a warning around the memory workload analyses…

The only warning that comes close is in the source counters where I get a warning around uncoalesced shared access.
The purpose of this sample was to solve the problem of uncoalesced access in the global memory, so that’s why the sample moved to shared memory…

rs277 · August 14, 2020, 7:26pm

While it’s true you don’t get an explicit warning, if you expand the “Memory Workload Analysis” section, you will see conflicts listed in the Shared Memory section.

Topic		Replies	Views
Problems about Profiling Shared Memory Bank Conflicts using nsight-compute Nsight Compute	2	1702	January 25, 2022
weird bank conflict when matrix transpose Nsight Compute	1	658	February 10, 2020
Shared memory bank conflicts and nsight metric CUDA Programming and Performance	15	6164	October 19, 2024
Why there is random bank conflicts? CUDA-MEMCHECK cuda	2	1263	September 19, 2023
"Other" category under Memory Workload Analysis > Shared Memory Nsight Compute	2	572	November 15, 2023
Is there any way to find out the location in cuda code that cause shared memory bank conflicts? CUDA Programming and Performance	6	1290	January 21, 2022
Uncoalesced Shared Accesses CUDA Programming and Performance	2	952	September 6, 2023
About bank conflict of shared_mem CUDA Programming and Performance	2	526	July 25, 2023
The increase of the shared memory size leads to the bankconflict (from 9 KB shared memory) Nsight Compute	5	615	July 14, 2023
Shared memory bank conflict CUDA Programming and Performance	4	546	July 30, 2025

Analyzing bank conflicts with Nsight compute

Related topics