I'm testing NCCL with a B300. Is this the right speed?

I’m currently testing nccl with two Dell XE9780s (B300 * 8). No matter how much I tune and turn all_reduce, it won’t come out more than 280GB.
Has anyone done the test with B300?
Or is 280GB the right speed..?
I couldn’t build XDR on the cx-8 card, so I’m currently configuring it as NDR (400G).
The OS is Ubuntu 24.04.4 and the nccl version is 2.30.4 and hpc-benchmarks is 26.02.
Is there a problem with downgrading to NDR that it doesn’t perform properly or something like that?
HPL is only 30% efficient too…
Is anyone else experiencing the same problem?