If you are seeing the same behaviour without VMA, why to complicate the problem? Start tuning the system and see if it helps. Adding more components will not help to troubleshoot. After tuning, I would suggest to check netstat -s/nstat and ‘netstat -unp’ to check the receive queue size.
The tuning guides are available from Mellanox site - Performance Tuning for Mellanox Adapters https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters
You also might check what is the current number of send/receive queues configured on interface and try to limit it to 16
ethtool -L rx 16 tx 16