Where's the procedure of packing network protocol header in RoCE v2?

It is expected behavior.

HCA can support up to 16 millions QPs. But there are only up to 64K UDP port number, and some UDP port was reserved for some specific application. So it is not 1:1 mapping between QP and src UDP port, but N:1 mapping. Always, a couple/group of QPs share one src UDP port. The main purpose of using scrambled src UDP port is for load balance in network. N:1 mapping works for that.