DLA bugs using deep-lab-v3 style network

I have tested with dilation of 6 on the dla and the error message i get is that the padding must be less than kernel size.
The image outputs are below. The first is running the model with 16 output channels for deconv on dla. The second is running the model with 16 output channels for deconv on gpu. The third is running with 17 output channels for deconv on dla. The fourth is running with 17 output channels for deconv on gpu.

deconv16_dla
deconv16_gpu
deconv17_dla
deconv17_gpu