DLA enabled Network considerably slower

bpinaya · July 13, 2020, 2:25pm

Hi, I’ve checked similar issues like this one and also been through the documentation on DLA. On GPU only for a custom network I have like 11.683681 ms. If I tried to build the network on DLA with GpuFallback enabled the time is 65.731544 ms. I do get some warnings like

1594648942:152:220 Warning : WARNING: (Unnamed Layer* 93) [Pooling]: DLA only supports windows in the range of [1-8].                                                                                                
1594648942:152:297 Warning : WARNING: (Unnamed Layer* 93) [Pooling]: DLA only supports strides in the range of [1-16].                                                                                               
1594648942:152:337 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 93) [Pooling] is not supported on DLA, falling back to GPU.                                                                   
1594648942:152:375 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 97) [Resize] is not supported on DLA, falling back to GPU.                                                                    
1594648942:152:398 Warning : WARNING: (Unnamed Layer* 98) [Pooling]: DLA only supports windows in the range of [1-8].                                                                                                
1594648942:152:419 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 98) [Pooling] is not supported on DLA, falling back to GPU.                                                                   
1594648942:152:463 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 102) [Resize] is not supported on DLA, falling back to GPU.                                                                   
1594648942:152:501 Warning : WARNING: (Unnamed Layer* 103) [Pooling]: DLA only supports windows in the range of [1-8].                                                                                               
1594648942:152:536 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 103) [Pooling] is not supported on DLA, falling back to GPU.
1594648942:152:572 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 107) [Resize] is not supported on DLA, falling back to GPU.
1594648942:152:617 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 112) [Resize] is not supported on DLA, falling back to GPU.
1594648942:152:654 Warning : WARNING: Default DLA is enabled but layer (Unnamed Layer* 117) [Resize] is not supported on DLA, falling back to GPU.
1594648942:837:904 Warning : WARNING: Internal DLA error for layer (Unnamed Layer* 140) [Deconvolution]. Switching to GPU fallback.
1594648942:838:102 Warning : WARNING: Internal DLA error for layer (Unnamed Layer* 140) [Deconvolution]. Switching to GPU fallback.
1594648965:375:357 Warning : WARNING: No implementation obeys reformatting-free rules, at least 18 reformatting nodes are needed, now picking the fastest path instead.

While I understand that there are some constraints for DLA, specially according layers my question is the following: If a layer is not supported and goes through GPU instead, it’s copied back and forth? Like for a not supported layer in DLA, to GPU, back to DLA?
What other reasons could explain this huge difference in time while inferencing?

Kind regards

dkreutz · July 13, 2020, 3:22pm

DLA performance is 5TOPS@INT8 or 2.5TOPS@FP16 while Xavier-GPU is 22TOPS@INT8 or 11TOPS@FP16. So GPU is approx 4.4 times faster than DLA.

Considering the other restrictions and warnings you have mentioned your results look plausible to me.