I found that a Large Kernel DepthWise convolutions (LKDWconvs) with torch.nn.Conv2d (pytorch/issues/85252) is not as fast as the counterpart of
It is guessed that this might be due to that
torch.nn.Conv2d is based on cuDNN, while
megengine adopts a self-developed CUDA operator specially optimized for a LKDWconv.
Your answer will be appreciated!