Are bank conflicts random or deterministic?

Hi,

I have a code, which has kernels with shared memory. When I run it several times I am getting different outputs without changing anything, just running same executable.
First I was thinking about bank conflicts, but after reading about it for a few day I now have some doubts.

Are bank conflicts random or deterministic phenomena?

In my understanding bank conflicts should be rather deterministic than random. I mean, that a given kernel configuration and the same size of global and shared arrays, one will either get bank conflicts or won’t get it. But it should not be like what I am having now…

Cheers,
Mikhail

When you are talking about different outputs, is it different performance or different results?

Bank conflicts (in the absence of other timing-dependent bugs) only worsen the performance, but do not change the results.

I am talking about different results, different values of the outputs. While debugging I have plenty of print outs, and some of those are different in different runs.

Then it is unrelated or only indirectly related to the bank conflicts.

Revisit your algorithm and the synchronization (have you used __syncthreads() or syncwarp() when exchanging data over shared memory? Have you declared the shared memory as volatile?

Have you run your program with the Compute Sanitizer?

To your original question: If your program is bug-free, bank conflicts and the order they are resolved will not change the output of your program.

Yes, I am using syncthreads after filling shared memory and
No, no volatile shared variables.
No, I have not used any debugger or sanitizer so far…

Okey, thanks a lot. You’ve confirmed my understanding, that it should be something else…

You normally should use volatile shared variables, it was at least necessary in the past:

Otherwise the compiler is free to cache the contents in registers or remove the code for reading or writing shared memory.

The compiler only analyzes code as single-threaded and does not consider that other threads may write to it or read from it.

Could be that it detects __syncthreads() nowadays and ensures the actual shared memory access?

Well… I am using shared variables for computing derivatives, similar to the samples describe in a book:
CUDA Fortran for Scientists and Engineers Best Practices for Efficient CUDA.
Volatile variables were not used there.
The same kernel subroutines worked with other test samples, so I guess they should be fine (although never be 100% sure). So, the shared memory should work more or less as expected.