Hi,
I have encountered some issues with running networks similar to deep lab v3+ for semantic segmentation on the dla (https://arxiv.org/pdf/1802.02611.pdf):
- According to official NVIDIA docs, dilation is supported in a convolution layer but the padding must be less than kernel size. This effectively means that in order to preserve the feature map height and width after dilated convolution, only dilation of 2 is allowed since padding would be dilation*(kernel_size-1)/2. Padding layers are not supported on DLA so is there some way to have large values of dilation run on DLA?
- There is a bug in deconvolution layers when output channels are more than 16. The model runs on dla but has incorrect results. Here is an example:
deconv_test.txt (2.5 KB)
- concatenation sometimes doesn’t work on dla and requires gpu fallback even if both layers being concatenated have outputs on dla memory.
All these are with 4.4 DP and tensorrt 7
1 Like
Hi,
1. Based on our document, dilation can be [1, 32]
2. We are going to test this. Will let you know later.
3. DLA has limited capacity.
If the model exceeds the capacity, TensorRT will move the remaining part back to GPU.
Thanks.
Hi,
Could you share what kind of incorrect output that you observes?
We can reproduce some precision issue in our environment.
We are going to feedback this issue to our internal team but want to check if this is your issue first.
[DLA]
-0.360107 -0.47998 -0.568359 -0.537109 -0.505371 -0.473633 -0.292236 -0.142578 0.00717545 0.156982 0.235474 0.164307 0.0929565 0.0216522 -0.0318298 -0.0140457 0.00375366
[GPU]
-0.359994 -0.479992 -0.568423 -0.536856 -0.505289 -0.473722 -0.292378 -0.142602 0.00717399 0.15695 0.235457 0.164188 0.0929186 0.0216494 -0.0318317 -0.0140436 0.00374448
Thanks.
Hi,
Thanks for the clarification.
So the different starts from 17 output channels, is that correct?
Thanks.
Yes. less than or equal to 16 works but greater than or equal to 17 fails
Hi,
We can reproduce this in our environment.
Will update more information with you once we find anything.
Thanks.
could you confirm if a convolution layer with kernel size 3, dilation 6, padding 6 can run on dla? I was not able to run on the dla and required gpu fall back
Hi,
Sorry to keep you waiting.
This issue is passed to our internal team.
We will update information with you once we got any progress.
Thanks.
Hi,
Sorry that this issue is still under investigation.
Will keep you updated once we got any feedback from our internal team.
Thanks.
Hi,
Thanks for your patience.
The deconv with num_output > 16 will be supported in our future TensorRT release.
Will keep you updated once it is available for the public.
Thanks.
1 Like
Hi,
Thanks for the update. Do you know if there is an update on issue number 1 in the original post?
Here is a more specific description of issue 1:
Although dilation is supported in [1,32], the padding required to ensure that the output feature map is the same size as the input feature size is not supported/allowed.
An example is running a layer with dilation of d=6 and kernel size of k=3 would require padding of d*(k-1)/2 = 6*(3-1)/2 = 6 which is not allowed since the padding has to be less than kernel size of 3. In general dilation > 2 will have this problem
Since the input of the dilation can’t have the same size as the output, dilation on dla is very difficult to use especially for networks that use multiple dilations in parallel.
1 Like
Hi,
Sorry for the late update.
It seems that there are some limitation on input size equals to output size.
But please understand that DLA is a hardware-based engine, which limits the flexibility for various usage.
Thanks.
1 Like