I tried running real-time inference on Docker on WSL2.
The inference, which takes about 30ms in the Linux environment, takes about 80ms in the WSL2 environment.
(The inference is face detection of a model named MTCNN implemented by pytorch.)
Host OS : Windows 11
WSL type : WSL2 (Ubuntu 20.04)
Docker image : nvcr.io/nvidia/tensorflow:20.10-tf1-py3 (Ubuntu 18.04.5)
GPU : GTX 3080 (laptop)
Cuda : 11.1
(I need to use both tensorflow 1.15 and pytorch in my software, so I installed pytorch additionally based on the above Docker Image.)
- In a WSL2 environment, does performing inference on small batches cause slowdowns?
- Is there a possibility that inference on a Docker environment on WSL2 will further slow down the inference speed?
- Are there any settings to avoid these slowdown?
Other my looked at
I read the following post about WSL2.
Looking at figure4, it appears that WSL2 is at a speed disadvantage when the number of batches is small.
On the other hand, figure8 shows that asynchronous communication makes Cuda startup from WSL2 faster.
Does figure8 introduce a method that can run faster in smaller batches?
I can’t understand the article well, so I want to know is it possible to resolve delay.