Using bandwidthTest tool(/usr/local/cuda/samples/1_Utilities/bandwidthTest/), D2D performance(864.3GB/S) More than the official given bandwidth(NVIDIA GeForce RTX 3080 760GB/S),Whether it is reasonable or not? why?
When calculating bandwidth, why multiply by 2.0,test code as follows:
// calculate bandwidth in GB/s
float time_s = elapsedTimeInMs / (float)1e3;
bandwidthInGBs = (2.0f * memSize * (float)MEMCOPY_ITERATIONS) / (float)1e9;
bandwidthInGBs = bandwidthInGBs / time_s;
The test data is as follows:
bandwidthTest-D2D, Bandwidth = 0.8 GB/s, Time = 0.00000 s, Size = 1000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 0.7 GB/s, Time = 0.00000 s, Size = 2000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 1.4 GB/s, Time = 0.00000 s, Size = 3000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 3.1 GB/s, Time = 0.00000 s, Size = 4000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 3.8 GB/s, Time = 0.00000 s, Size = 5000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 4.5 GB/s, Time = 0.00000 s, Size = 6000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 5.3 GB/s, Time = 0.00000 s, Size = 7000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 6.2 GB/s, Time = 0.00000 s, Size = 8000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 7.1 GB/s, Time = 0.00000 s, Size = 9000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 7.7 GB/s, Time = 0.00000 s, Size = 10000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 8.6 GB/s, Time = 0.00000 s, Size = 11000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 9.3 GB/s, Time = 0.00000 s, Size = 12000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 10.3 GB/s, Time = 0.00000 s, Size = 13000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 10.9 GB/s, Time = 0.00000 s, Size = 14000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 11.7 GB/s, Time = 0.00000 s, Size = 15000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 12.4 GB/s, Time = 0.00000 s, Size = 16000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 13.2 GB/s, Time = 0.00000 s, Size = 17000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 14.0 GB/s, Time = 0.00000 s, Size = 18000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 14.9 GB/s, Time = 0.00000 s, Size = 19000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 15.5 GB/s, Time = 0.00000 s, Size = 20000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 17.2 GB/s, Time = 0.00000 s, Size = 22000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 18.6 GB/s, Time = 0.00000 s, Size = 24000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 20.3 GB/s, Time = 0.00000 s, Size = 26000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 21.9 GB/s, Time = 0.00000 s, Size = 28000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 23.6 GB/s, Time = 0.00000 s, Size = 30000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 24.7 GB/s, Time = 0.00000 s, Size = 32000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 26.3 GB/s, Time = 0.00000 s, Size = 34000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 28.1 GB/s, Time = 0.00000 s, Size = 36000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 29.5 GB/s, Time = 0.00000 s, Size = 38000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 30.9 GB/s, Time = 0.00000 s, Size = 40000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 33.1 GB/s, Time = 0.00000 s, Size = 42000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 34.0 GB/s, Time = 0.00000 s, Size = 44000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 35.9 GB/s, Time = 0.00000 s, Size = 46000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 37.0 GB/s, Time = 0.00000 s, Size = 48000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 39.0 GB/s, Time = 0.00000 s, Size = 50000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 46.7 GB/s, Time = 0.00000 s, Size = 60000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 55.1 GB/s, Time = 0.00000 s, Size = 70000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 63.4 GB/s, Time = 0.00000 s, Size = 80000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 70.3 GB/s, Time = 0.00000 s, Size = 90000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 78.9 GB/s, Time = 0.00000 s, Size = 100000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 157.6 GB/s, Time = 0.00000 s, Size = 200000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 236.1 GB/s, Time = 0.00000 s, Size = 300000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 314.2 GB/s, Time = 0.00000 s, Size = 400000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 395.9 GB/s, Time = 0.00000 s, Size = 500000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 473.6 GB/s, Time = 0.00000 s, Size = 600000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 557.8 GB/s, Time = 0.00000 s, Size = 700000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 637.8 GB/s, Time = 0.00000 s, Size = 800000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 716.3 GB/s, Time = 0.00000 s, Size = 900000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 796.3 GB/s, Time = 0.00000 s, Size = 1000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 864.3 GB/s, Time = 0.00000 s, Size = 2000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 589.9 GB/s, Time = 0.00001 s, Size = 3000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 526.0 GB/s, Time = 0.00001 s, Size = 4000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 514.1 GB/s, Time = 0.00001 s, Size = 5000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 533.1 GB/s, Time = 0.00001 s, Size = 6000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 544.8 GB/s, Time = 0.00001 s, Size = 7000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 560.1 GB/s, Time = 0.00001 s, Size = 8000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 577.2 GB/s, Time = 0.00002 s, Size = 9000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 586.3 GB/s, Time = 0.00002 s, Size = 10000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 604.1 GB/s, Time = 0.00002 s, Size = 11000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 609.4 GB/s, Time = 0.00002 s, Size = 12000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 606.8 GB/s, Time = 0.00002 s, Size = 13000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 611.0 GB/s, Time = 0.00002 s, Size = 14000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 619.5 GB/s, Time = 0.00002 s, Size = 15000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 623.6 GB/s, Time = 0.00003 s, Size = 16000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 626.4 GB/s, Time = 0.00003 s, Size = 18000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 634.0 GB/s, Time = 0.00003 s, Size = 20000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 637.9 GB/s, Time = 0.00003 s, Size = 22000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 644.1 GB/s, Time = 0.00004 s, Size = 24000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 645.0 GB/s, Time = 0.00004 s, Size = 26000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 649.1 GB/s, Time = 0.00004 s, Size = 28000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 649.9 GB/s, Time = 0.00005 s, Size = 30000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 653.9 GB/s, Time = 0.00005 s, Size = 32000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 655.3 GB/s, Time = 0.00005 s, Size = 36000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 659.1 GB/s, Time = 0.00006 s, Size = 40000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 661.9 GB/s, Time = 0.00007 s, Size = 44000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 664.1 GB/s, Time = 0.00007 s, Size = 48000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 665.6 GB/s, Time = 0.00008 s, Size = 52000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 666.9 GB/s, Time = 0.00008 s, Size = 56000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 668.4 GB/s, Time = 0.00009 s, Size = 60000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 676.1 GB/s, Time = 0.00009 s, Size = 64000000 bytes, NumDevsUsed = 1
bandwidthTest-D2D, Bandwidth = 670.5 GB/s, Time = 0.00010 s, Size = 68000000 bytes, NumDevsUsed = 1
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: “NVIDIA GeForce RTX 3080”
CUDA Driver Version / Runtime Version 11.7 / 11.7
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 10018 MBytes (10504437760 bytes)
(068) Multiprocessors, (128) CUDA Cores/MP: 8704 CUDA Cores
GPU Max Clock rate: 1710 MHz (1.71 GHz)
Memory Clock rate: 9501 Mhz
Memory Bus Width: 320-bit
L2 Cache Size: 5242880 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 102400 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 101 / 0