Counting the number of bank conflicts.


Is there any information available as to how does the cudaProfiler calculates the number of bank conflicts and then populates the counter “warp serialized”?
The PTX I believe does not say anything on how the shared memory is accessed, then from where does the knowledge of bank conflicts come from?


My guess would be that there is a hardware performance counter that records this, but it is just speculation.