Hi,
I’m trying to use DLA deep learning accelerators in my app. And I have some questions:
- I create graph with tensorflow 1.14 (layers api) and use uff parser. When I’m trying to run model on DLA I get next warnings:
[10/08/2019-10:59:34] [W] [TRT] Default DLA is enabled but layer conv/kernel is not supported on DLA, falling back to GPU.
[10/08/2019-10:59:34] [W] [TRT] Default DLA is enabled but layer conv/bias is not supported on DLA, falling back to GPU.
Then trtexec prints next:
[10/08/2019-10:59:34] [I] [TRT] --------------- Layers running on DLA:
[10/08/2019-10:59:34] [I] [TRT] {conv/Conv2D,conv/BiasAdd},
[10/08/2019-10:59:34] [I] [TRT] --------------- Layers running on GPU:
[10/08/2019-10:59:34] [I] [TRT] output_tensor,
So this layer run on DLA. But because of warnings above I can’t run trtexac without option --allowGPUFallback. It looks like a bug. Could you answer this is normal behavior?
-
When I’d run my model on DLA I found it’s create kernel on GPU and it execution time depends on first operation in a graph. I tested with conv2d and max_pool operations and got 200us and 2.7ms correspondingly. Could you point me where I can read about it?
-
Where can I find an information about memory limit for DLA?
Thanks
And one more question: name of kernel from the second question usually is genericReformat::copyPackedKernel. So, can I skip its execution?
Hello,
1.
This is because DLA doesn’t support all TensorRT layers.
In order to run a model successfully, TensorRT can automatically put the non-supported layer into GPU.
This require you to use --allowGPUFallback to enable the feature.
Here is the support matrix of DLA for your reference:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_layers
2.
Similar to problem 1. conv2d is natively supported by the DLA while max_pool is fallbacked into the GPU implementation.
3.
DLA can access both SDRAM and internal RAM.
https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/
4. Unfortunately, you cannot.
Currently, DLA is launched by TensorRT API, which requires this layer.
Thanks.
Thank you for answer
1, 2.
As I see in
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_layers
DLA supports Convolution and Pooling layers. Isn’t that right?
-
I meant memory limit per DLA core for a model. I read it was 2MB in tensorrt 5.1.6.1. Is it correct?
Is there this limit in trt 6.0.1?
-
So, can I launch my model on DLA without TensoRT API?
Hi,
Sorry for the late update.
Here are some information for your reference:
1.2.
There are lots of different type convolution and pooling layer.
Not all the combination are supported by the DLA natively.
Based on your log, it looks like the use case of conv2d is out of DLA support.
[10/08/2019-10:59:34] [W] [TRT] Default DLA is enabled but layer conv/kernel is not supported on DLA, falling back to GPU.
[10/08/2019-10:59:34] [W] [TRT] Default DLA is enabled but layer conv/bias is not supported on DLA, falling back to GPU.
3. DLA can use the onboard memory. It should not limit to 2Mb.
4.No. Only TensorRT API can access DLA on Xavier right now.
Thanks.