The Large Kernel DepthWise convolutions (LKDWconvs) with torch.nn.Conv2d (with cuDNN) is not as fast as the counterpart of megengine

Hi, guys,
I found that a Large Kernel DepthWise convolutions (LKDWconvs) with torch.nn.Conv2d (pytorch/issues/85252) is not as fast as the counterpart of megengine.
It is guessed that this might be due to that torch.nn.Conv2d is based on cuDNN, while megengine adopts a self-developed CUDA operator specially optimized for a LKDWconv.

Your answer will be appreciated!

Hi @466309936 ,
Apologies for delayed response,
Your guess seems reasonable, however I may need to validate the same with our team.
Kindly allow us some time.
Thank you for you patience.

1 Like