How to measure shared memory bandwidth

Hi guys. I read a lot of papers which say bandwidth between shared and device memory is the neckbottle and support some statistic. But they do not tell me how to get the occupancy of on-chip bandwidth. Could you tell me? Is there some ways to do this, like -ptx complier flag or some programming sentences? Thanks.

Who knows?

Please don’t cross-post a topic.