depthwise convolution is very slow using tensorrt3.0

I convert a tensorflow mobilenet model to UFF and profile it on tx2 using tensorrt3.0.
The layer time is :

Why the depthwise_conv(Unnamed Layer* 0) 0.197ms
MobileNet/conv_1/BiasAdd + MobileNet/conv_1/batch_norm/Relu 0.427ms
MobileNet/conv_ds_2/depthwise_conv/BiasAdd + MobileNet/conv_ds_2/dw_batch_norm/Relu 8.436ms
MobileNet/conv_ds_2/pointwise_conv/BiasAdd + MobileNet/conv_ds_2/pw_batch_norm/Relu 0.731ms
(Unnamed Layer* 13) 0.850ms
MobileNet/conv_ds_3/depthwise_conv/BiasAdd + MobileNet/conv_ds_3/dw_batch_norm/Relu 4.350ms
MobileNet/conv_ds_3/pointwise_conv/BiasAdd + MobileNet/conv_ds_3/pw_batch_norm/Relu 0.502ms
MobileNet/conv_ds_4/depthwise_conv/BiasAdd + MobileNet/conv_ds_4/dw_batch_norm/Relu 8.709ms
MobileNet/conv_ds_4/pointwise_conv/BiasAdd + MobileNet/conv_ds_4/pw_batch_norm/Relu 0.814ms
(Unnamed Layer* 30) 0.430ms
MobileNet/conv_ds_5/depthwise_conv/BiasAdd + MobileNet/conv_ds_5/dw_batch_norm/Relu 2.753ms
MobileNet/conv_ds_5/pointwise_conv/BiasAdd + MobileNet/conv_ds_5/pw_batch_norm/Relu 0.417ms
MobileNet/conv_ds_6/depthwise_conv/BiasAdd + MobileNet/conv_ds_6/dw_batch_norm/Relu 5.513ms
MobileNet/conv_ds_6/pointwise_conv/BiasAdd + MobileNet/conv_ds_6/pw_batch_norm/Relu 0.751ms
(Unnamed Layer* 47) 0.226ms
MobileNet/conv_ds_7/depthwise_conv/BiasAdd + MobileNet/conv_ds_7/dw_batch_norm/Relu 2.938ms
MobileNet/conv_ds_7/pointwise_conv/BiasAdd + MobileNet/conv_ds_7/pw_batch_norm/Relu 0.433ms
MobileNet/conv_ds_8/depthwise_conv/BiasAdd + MobileNet/conv_ds_8/dw_batch_norm/Relu 5.846ms
MobileNet/conv_ds_8/pointwise_conv/BiasAdd + MobileNet/conv_ds_8/pw_batch_norm/Relu 0.796ms
MobileNet/conv_ds_9/depthwise_conv/BiasAdd + MobileNet/conv_ds_9/dw_batch_norm/Relu 4.293ms
MobileNet/conv_ds_9/pointwise_conv/BiasAdd + MobileNet/conv_ds_9/pw_batch_norm/Relu 0.786ms
MobileNet/conv_ds_10/depthwise_conv/BiasAdd + MobileNet/conv_ds_10/dw_batch_norm/Relu 4.885ms
MobileNet/conv_ds_10/pointwise_conv/BiasAdd + MobileNet/conv_ds_10/pw_batch_norm/Relu 0.787ms
MobileNet/conv_ds_11/depthwise_conv/BiasAdd + MobileNet/conv_ds_11/dw_batch_norm/Relu 5.855ms
MobileNet/conv_ds_11/pointwise_conv/BiasAdd + MobileNet/conv_ds_11/pw_batch_norm/Relu 0.748ms
MobileNet/conv_ds_12/depthwise_conv/BiasAdd + MobileNet/conv_ds_12/dw_batch_norm/Relu 4.874ms
MobileNet/conv_ds_12/pointwise_conv/BiasAdd + MobileNet/conv_ds_12/pw_batch_norm/Relu 0.791ms
(Unnamed Layer* 96) 0.118ms
MobileNet/conv_ds_13/depthwise_conv/BiasAdd + MobileNet/conv_ds_13/dw_batch_norm/Relu 5.715ms
MobileNet/conv_ds_13/pointwise_conv/BiasAdd + MobileNet/conv_ds_13/pw_batch_norm/Relu 0.502ms
MobileNet/conv_ds_14/depthwise_conv/BiasAdd 10.942ms
Time over all layers: 85.414
Why is the depthwise conv cost so much time?


May I know which data format do you use? NCHW or NHWC?


Hi AastaLLL,

tensorflow depthwise conv API only supports NHWC. I use NHWC data format.



Currently, separable convolution is implemented with groups=C + conv1x1, and it’s not efficient enough.
We’re looking at the possibility to optimize general convolution groups. But we can’t provide any firm commitments or estimates at this time.

Thanks and sorry for the inconvenience.


@373197201, can you please specify which implementation of “tensorflow mobilenet” you were using ?

Hi, any chance to see the depthwise better optimized in cudnn?
We have implemented our own kernels in cuda, but would like more optimal convolutions like winograd.


We’re looking at the possibility, but we can’t provide any firm commitments or estimates at this time.

Hi AastaLLL:

Have the issue been solved?



I’m trying to convert TF frozen default mobilenet to uff format with uff.from_tensorflow_frozen_model method.

I am facing an issue (error saying):

AttributeError: ‘RepeatedCompositeFieldContainer’ object has no attribute ‘unknown_rank’

did you (@373197201) face any such issue ???

Any help will be appreciated. Thanks!!

Hi, 373197201

The improvement is in our plan but we cannot disclose concrete schedule.
Please pay attention to our announcement for the latest update.


Hi, gautam.patel

This issue is from the pure TensorFlow use case.
It’s recommended to share your issue with the TensorFlow developer for information.


Hi 373197201,

Could you please share your wisdom on how to convert frozen ssd_mobilenet_v2_coco model to uff format ? I have been trying for few days now but haven’t been successful.

Kindly help me out.