DLA bugs using deep-lab-v3 style network

Hi,

I have encountered some issues with running networks similar to deep lab v3+ for semantic segmentation on the dla (https://arxiv.org/pdf/1802.02611.pdf):

  1. According to official NVIDIA docs, dilation is supported in a convolution layer but the padding must be less than kernel size. This effectively means that in order to preserve the feature map height and width after dilated convolution, only dilation of 2 is allowed since padding would be dilation*(kernel_size-1)/2. Padding layers are not supported on DLA so is there some way to have large values of dilation run on DLA?
  2. There is a bug in deconvolution layers when output channels are more than 16. The model runs on dla but has incorrect results. Here is an example:
    deconv_test.txt (2.5 KB)
  3. concatenation sometimes doesn’t work on dla and requires gpu fallback even if both layers being concatenated have outputs on dla memory.

All these are with 4.4 DP and tensorrt 7

Hi,

1. Based on our document, dilation can be [1, 32]
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers

2. We are going to test this. Will let you know later.

3. DLA has limited capacity.
If the model exceeds the capacity, TensorRT will move the remaining part back to GPU.

Thanks.

Hi,

Could you share what kind of incorrect output that you observes?
We can reproduce some precision issue in our environment.
We are going to feedback this issue to our internal team but want to check if this is your issue first.

[DLA]

-0.360107 -0.47998 -0.568359 -0.537109 -0.505371 -0.473633 -0.292236 -0.142578 0.00717545 0.156982 0.235474 0.164307 0.0929565 0.0216522 -0.0318298 -0.0140457 0.00375366 

[GPU]

-0.359994 -0.479992 -0.568423 -0.536856 -0.505289 -0.473722 -0.292378 -0.142602 0.00717399 0.15695 0.235457 0.164188 0.0929186 0.0216494 -0.0318317 -0.0140436 0.00374448

Thanks.

I have tested with dilation of 6 on the dla and the error message i get is that the padding must be less than kernel size.
The image outputs are below. The first is running the model with 16 output channels for deconv on dla. The second is running the model with 16 output channels for deconv on gpu. The third is running with 17 output channels for deconv on dla. The fourth is running with 17 output channels for deconv on gpu.

deconv16_dla
deconv16_gpu
deconv17_dla
deconv17_gpu

Hi,

Thanks for the clarification.
So the different starts from 17 output channels, is that correct?

Thanks.

Yes. less than or equal to 16 works but greater than or equal to 17 fails

Hi,

We can reproduce this in our environment.
Will update more information with you once we find anything.

Thanks.

could you confirm if a convolution layer with kernel size 3, dilation 6, padding 6 can run on dla? I was not able to run on the dla and required gpu fall back