Hi,
Is there any information available as to how does the cudaProfiler calculates the number of bank conflicts and then populates the counter “warp serialized”?
The PTX I believe does not say anything on how the shared memory is accessed, then from where does the knowledge of bank conflicts come from?
Thanks.