HPL fails when switching P and Q


I’m doing HPL runs on several nodes with 4 H100 each. I observe no problems when P < Q. However, for five nodes and onwards, when I switch the values of P and Q (so in this case 4 x 5 → 5 x 4), program execution crashes, resulting in various Infiniband messages. Note that for a smaller number of nodes I can switch the parameters, so e.g. 4 x 3 → 3 x 4.
I was wondering if there is any requirement that P must not exceed the number of GPUs per node?