Analyzing bank conflicts with Nsight compute

I’m following a book around CUDA and they show following example to illustrate the bank conflicts.
The book uses visual profiler but because I have a newer GPU, I need to use Nsight compute.

This is the kernel:

__global__ void matrix_transpose_shared(int* input, int* output) {

	__shared__ int sharedMemory[BLOCK_SIZE][BLOCK_SIZE];

	// global index	
	int indexX = threadIdx.x + blockIdx.x * blockDim.x;
	int indexY = threadIdx.y + blockIdx.y * blockDim.y;

	// transposed global memory index
	int tindexX = threadIdx.x + blockIdx.y * blockDim.x;
	int tindexY = threadIdx.y + blockIdx.x * blockDim.y;

	// local index
	int localIndexX = threadIdx.x;
	int localIndexY = threadIdx.y;

	int index = indexY * N + indexX;
	int transposedIndex = tindexY * N + tindexX;

	// reading from global memory in coalesed manner and performing tanspose in shared memory
	sharedMemory[localIndexX][localIndexY] = input[index];


	// writing into global memory in coalesed fashion via transposed data in shared memory
	output[transposedIndex] = sharedMemory[localIndexY][localIndexX];

When I profile my code in Nsight Compute, it doesn’t even give me a warning around the memory workload analyses…

The only warning that comes close is in the source counters where I get a warning around uncoalesced shared access.
The purpose of this sample was to solve the problem of uncoalesced access in the global memory, so that’s why the sample moved to shared memory…

While it’s true you don’t get an explicit warning, if you expand the “Memory Workload Analysis” section, you will see conflicts listed in the Shared Memory section.