Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other
Target Operating System
Linux
QNX
other
Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other
SDK Manager Version
1.7.0.8846
other
Host Machine Version
native Ubuntu 18.04
other
I am trying to target an onnx model with trtexec and use of DLACore with GPUFallBack; I see a significant slowdown. When I dumped the profile for each layer, it seems like the graph is getting split between DLA and GPU and there is frequent transfer of data between DLA and GPU.
In particular, I found that layer 540 in attached screen shot is the cause for the slowdown:
Digging further in the verbose log, I see that the layer in question is a Concat operator of 4 different tensors.
In order to achieve the concat, the compiler appears to be reformatting the input. Is this what is going on? How do I visualize this graph and why is the tensor getting broken down into smaller tensors.