Hi,
I implemented a simple matrix multiplication using cuda on a linux machine.
How I can measure the ratio practical memory bandwidth / theoretical bandwidth?
Thanks
Hi,
I implemented a simple matrix multiplication using cuda on a linux machine.
How I can measure the ratio practical memory bandwidth / theoretical bandwidth?
Thanks