I have encountered some issues with running networks similar to deep lab v3+ for semantic segmentation on the dla (https://arxiv.org/pdf/1802.02611.pdf):
- According to official NVIDIA docs, dilation is supported in a convolution layer but the padding must be less than kernel size. This effectively means that in order to preserve the feature map height and width after dilated convolution, only dilation of 2 is allowed since padding would be dilation*(kernel_size-1)/2. Padding layers are not supported on DLA so is there some way to have large values of dilation run on DLA?
- There is a bug in deconvolution layers when output channels are more than 16. The model runs on dla but has incorrect results. Here is an example:
deconv_test.txt (2.5 KB)
- concatenation sometimes doesn’t work on dla and requires gpu fallback even if both layers being concatenated have outputs on dla memory.
All these are with 4.4 DP and tensorrt 7