GPU Bandwidth

Hello all,

I run CUDA sample code to test bandwidth of RTX 3090 and the bandwidth is 25 GB/s, but PCIe Gen 4 theoretical bandwidth should be 32 GB/s.
I know that data transmission of PCIe includes TLP header, DLLP, and Max Payload Size.
How to calculate the raw bandwidth from data size of memory copy with CUDA ? or any formula to calculate .