How to measure GPU Memory Bandwidth?

I have a Geforce RTX 4060 Ti 16GB, and I want to measure the bandwidth from GPU to VRAM. As I know, the memory bandwidth for this model should be
18 Gbps * 128bit / 8 = 288 GB/s.
I tried using the p2pBandwidthLatencyTest tool that comes with the CUDA-Samples, but the results I got seem to be significantly different from 288 GB/s. Is this normal, or is there a better tool you would recommend?

You won’t be able to achieve the published bandwidth. That is a theoretical maximum, not achievable using a real-world test. Getting a measurement of 243.25 for a theoretical maximum of 288 would be normal.