Memory Performance Evaluation

I would like to evaluate memory performance of GPU kernels, specifically, amount of data moved across different memories and effective memory bandwidth. Any pointers on how to get this information?

Thanks!
Rajesh