Ampere GPU cards' poor P2P results

The default result of p2pbandwidthlatencyTest in cuda sample is not well on Ampere datacenter GPUs.
I have to lock mem,core frequency by nvidia-smi -ac command to get a better result. The values showed bellow is a comparision bettween w/ and w/o using nvidia-smi -ac. This pheno is not reproduced on other old architecture products, such as V100. So, what I want to know is why Ampere GPUs’ P2P results is so poor at default frequency ?

  1. Lock FB and GPU frequecy by nvidia-smi -ac command.
    Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 777.75 16.11 16.16 16.14 21.24 21.25 21.21 21.22
    1 16.15 777.75 16.17 16.14 21.28 21.20 21.30 21.23
    2 16.14 16.13 777.75 16.15 21.37 21.28 21.28 21.27
    3 16.12 16.13 16.18 777.75 21.29 21.21 21.22 21.22
    4 21.18 21.25 21.21 21.16 777.75 16.26 16.26 16.26
    5 21.09 21.08 21.13 21.08 16.27 778.53 16.24 16.27
    6 21.24 21.10 21.22 21.15 16.29 16.25 778.53 16.30
    7 21.18 21.18 21.19 21.16 16.29 16.28 16.27 778.53
    Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 776.98 93.38 25.77 25.77 18.50 18.51 18.51 18.52
    1 93.41 783.21 25.77 25.77 18.50 18.51 18.51 18.51
    2 25.77 25.77 783.60 93.40 18.51 18.50 18.52 18.50
    3 25.77 25.76 93.40 782.42 18.51 18.51 18.51 18.51
    4 18.51 18.50 18.51 18.51 782.82 93.40 25.77 25.77
    5 18.50 18.50 18.52 18.51 93.42 782.42 25.77 25.77
    6 18.51 18.50 18.51 18.51 25.77 25.77 783.60 93.40
    7 18.50 18.50 18.51 18.52 25.77 25.77 93.39 782.03
    Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 782.03 17.39 17.41 17.25 27.52 27.39 27.37 27.55
    1 17.34 781.84 17.28 17.34 27.51 27.39 27.44 27.50
    2 17.42 17.34 782.23 17.41 27.60 27.47 27.44 27.60
    3 17.38 17.36 17.36 781.45 27.55 27.41 27.38 27.49
    4 27.61 27.51 27.70 27.64 783.21 17.38 17.42 17.44
    5 27.45 27.44 27.42 27.46 17.38 782.82 17.37 17.40
    6 27.61 27.50 27.62 27.61 17.44 17.40 781.84 17.42
    7 27.66 27.50 27.55 27.58 17.39 17.37 17.37 782.42
    Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 781.64 185.11 51.52 51.50 35.54 35.53 35.53 35.53
    1 185.11 781.84 51.48 51.47 35.54 35.56 35.54 35.53
    2 51.50 51.50 782.03 185.09 35.54 35.55 35.53 35.54
    3 51.53 51.53 185.13 780.86 35.53 35.55 35.55 35.53
    4 35.56 35.57 35.53 35.54 782.62 185.09 51.52 51.53
    5 35.54 35.56 35.56 35.54 185.11 782.23 51.52 51.53
    6 35.54 35.56 35.53 35.56 51.52 51.51 783.21 185.02
    7 35.54 35.56 35.54 35.54 51.52 51.52 185.15 780.86

  2. Default results w/o using nvidia-smi -ac command:
    Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 726.41 15.67 15.62 15.64 18.40 20.39 20.08 19.80
    1 15.91 722.71 15.69 15.67 19.81 20.40 20.13 19.84
    2 15.94 15.94 727.42 15.68 19.85 20.39 20.11 19.88
    3 15.86 15.89 16.07 777.36 19.84 20.39 20.10 19.82
    4 17.93 17.91 18.42 21.21 778.14 15.97 15.94 15.92
    5 18.40 18.41 18.40 20.79 16.18 777.75 15.92 15.92
    6 17.90 18.14 18.14 20.83 16.17 16.16 777.75 15.90
    7 17.75 17.59 17.92 17.91 16.18 16.18 16.14 778.91
    Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 727.08 92.91 16.72 19.93 18.55 18.55 18.54 18.54
    1 91.58 726.07 16.72 16.45 18.40 18.54 18.56 18.54
    2 19.86 19.86 730.48 92.88 16.26 18.56 18.54 18.55
    3 19.57 19.57 91.67 725.73 16.27 16.89 18.55 18.55
    4 18.40 18.55 18.55 18.55 726.41 91.59 17.96 25.77
    5 18.55 18.55 18.55 18.55 93.41 782.42 16.71 25.77
    6 18.56 18.56 18.53 18.56 19.89 19.94 782.03 93.38
    7 18.55 18.55 18.54 18.55 20.26 25.77 93.42 782.42
    Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 733.91 17.22 17.30 17.39 27.47 27.47 27.44 27.48
    1 17.24 746.18 17.25 17.37 26.52 27.38 27.41 27.49
    2 17.34 17.36 781.45 17.35 25.21 25.86 25.57 27.54
    3 17.32 17.35 17.27 783.40 25.17 25.85 25.47 25.13
    4 24.99 25.01 25.57 27.71 782.42 17.20 17.11 17.19
    5 26.79 26.84 26.88 27.66 17.36 782.23 17.34 17.29
    6 27.37 27.56 25.55 27.65 17.37 17.29 782.23 17.25
    7 25.35 25.16 25.55 25.17 17.32 17.31 17.38 781.05
    Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
    D\D 0 1 2 3 4 5 6 7
    0 734.09 181.26 33.43 32.90 33.05 33.03 34.29 35.61
    1 181.78 730.48 32.88 32.89 32.44 32.44 32.47 35.47
    2 33.44 33.43 735.12 181.26 34.62 35.65 35.63 35.28
    3 32.89 32.89 181.74 730.65 33.26 35.62 35.59 35.64
    4 35.61 35.62 35.64 35.40 782.03 185.04 51.53 51.53
    5 35.65 35.62 35.63 35.47 185.24 782.23 51.53 51.53
    6 35.60 35.24 35.59 35.64 51.50 51.51 782.62 181.37
    7 32.52 32.49 32.51 32.50 40.37 51.52 185.28 780.66