Slow real-time inference using WSL2

I tried running real-time inference on Docker on WSL2.
The inference, which takes about 30ms in the Linux environment, takes about 80ms in the WSL2 environment.
(The inference is face detection of a model named MTCNN implemented by pytorch.)

My environments
Host OS : Windows 11
WSL type : WSL2 (Ubuntu 20.04)
Docker image : nvcr.io/nvidia/tensorflow:20.10-tf1-py3 (Ubuntu 18.04.5)
GPU : GTX 3080 (laptop)
Cuda : 11.1
(I need to use both tensorflow 1.15 and pytorch in my software, so I installed pytorch additionally based on the above Docker Image.)

My Questions

  • In a WSL2 environment, does performing inference on small batches cause slowdowns?
  • Is there a possibility that inference on a Docker environment on WSL2 will further slow down the inference speed?
  • Are there any settings to avoid these slowdown?

Other my looked at
I read the following post about WSL2.
https://developer.nvidia.com/blog/leveling-up-cuda-performance-on-wsl2-with-new-enhancements/

Looking at figure4, it appears that WSL2 is at a speed disadvantage when the number of batches is small.
On the other hand, figure8 shows that asynchronous communication makes Cuda startup from WSL2 faster.

Does figure8 introduce a method that can run faster in smaller batches?
I can’t understand the article well, so I want to know is it possible to resolve delay.

Hello,

In a WSL2 environment, does performing inference on small batches cause slowdowns?

Yes, the bigger the workload the less overhead you will see

  • Is there a possibility that inference on a Docker environment on WSL2 will further slow down the inference speed?

Usually the slow down introduced by container is minor

  • Are there any settings to avoid these slowdown?

Also it is not a way to completely avoid slow down make sure of the following: