measuring used memory bandwidth

Hi,

I implemented a simple matrix multiplication using cuda on a linux machine.
How I can measure the ratio practical memory bandwidth / theoretical bandwidth?

Thanks