first I would remove the call to send and see that the measurement itself does not add latency. latency should be ~0 in this case.
The call to send should go to vma in this case.
java.net.Socket has its price indeed. Inside, Java calls send syscall or any other write syscall which VMA intercepts.
Afaik, Kernel latency is several micros while vma is around 1-2 depends on the setup. So if most of the latency comes from Java then maybe 1-2 micro diff is not seen and falls under 5-7 fluctuation. you can run several tests with and without VMA to see the tendency
A deeper understanding of all the hardware/software components and the scenario will require a support case in Nvidia portal (you can send an email to enterprisesupport@nvidia.com and the case will be handled according to the entitlement)
Thanks for the reply, we are now moving the dummy message approach from testing phase to production. But, we have problem sending dummy messages on the production site to real server
auto n2 = send(m_Socket, sendPtr, sendSize, MSG_NOSIGNAL | VMA SND FLAGS DUMMY);
auto n = send(m_Socket, sendPtr, sendSize, MSG_NOSIGNAL);
first line always come back with error “Resource temporarily unavailable” while second line has no problem.
we verified “dummy send” capability in HW using the vma_tracelevel= debug approach, confirm QP=1