Question #2 Due to TensorRT not caring what flag I use I get this warning:
onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
***************** CAN’T USE BOLD DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer 3) [Shuffle] device type to GPU.*
*****************CAN’T USE BOLD
Since TRT doesn’t care my flags I cannot use DLA on specific layers. What is your comment and recomendation on this? (Note: I do not construct network by hand and I cannot for the love of me write a code to do that to just include layer.setPrecision(xxx), layer.setOutputType(xxx) so that the precision flags would work, so please do not recommend that)
Question #3 What does this warning mean, I’d be happy if you can enlighten me:
DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer Add_259
Question #3.5 What should be the “a number” in setDLACore flag? I get AGX has 64 tensorcores, but how many DLA does it have, should it be 64?
Thank you very much in advance,
Cem
Environment
TensorRT Version: 7.1.3 because version 8 caused me great troubles in the past month. GPU Type: Nvidia AGX Nvidia Driver Version: ? CUDA Version: 10.2 Operating System + Version: 18.04
They answer Question 3.5 and mention the precision values for DLA cores in Question 2 but offer no solution to my question, the rest isn’t answered.
I found on AGX forum relating Question 3, @AastaLLL mentioned that after Jetpack 4.2.1 DLA cores will support 32 subgraphs. Why are they still supporting 8 subgraphs per DLA core in 4.5.1? DLA supports only 3 subgraphs per DLA core
I need help with Question 1 and 2 they constitute the core of my problem.
Please note that setDLACore() is to set the ID of used DLA hardware.
On Xavier, there are two DLA cores so the ID should be either 0 or 1.
Q1. This is not a bug.
When you set the half mode, TensorRT will inference the model in half mode.
But it is possible that some intermediate layers will use float mode for better performance.
Since some layer operation is not friendly to run in half manner.
I can summarize my problem as such: I cannot convert any layers to FP16 due to kSTRICT_TYPES of FP16 flags having no effect except when specifically called for specific layers. Mixed precision + kSTRICT_TYPES, which type is chosen? - #7 by spolisetty
Since I cannot convert layers to FP16 I cannot use DLA on any layers, not just shuffle, but conv layers too. I try to use DLA on any possible layer, not specific layers. All of the attempted conversions to DLA falls back to GPU usage. (Though not all due to failure of FP16 conversion, but some due to this sort of errors DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer Add_259)