[03/02/2023-09:19:46] [E] Error[2]: [eglUtils.cpp::operator()::72] Error Code 2: Internal Error (Assertion (eglCreateStreamKHR) != nullptr failed. )
[03/02/2023-09:19:46] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[03/02/2023-09:19:46] [E] Engine could not be created from network
[03/02/2023-09:19:46] [E] Building engine failed
[03/02/2023-09:19:46] [E] Failed to create engine from model or file.
[03/02/2023-09:19:46] [E] Engine set up failed
[03/04/2023-21:58:47] [W] [TRT] Unable to determine GPU memory usage
[03/04/2023-21:58:47] [W] [TRT] Unable to determine GPU memory usage
[03/04/2023-21:58:47] [I] [TRT] [MemUsageChange] Init CUDA: CPU +5, GPU +0, now: CPU 17, GPU 0 (MiB)
[03/04/2023-21:58:47] [W] [TRT] CUDA initialization failure with error: 222. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
[03/04/2023-21:58:47] [E] Builder creation failed
[03/04/2023-21:58:47] [E] Failed to create engine from model or file.
[03/04/2023-21:58:47] [E] Engine set up failed
I resolved the build failure by adding the tegra-gl folder into the LD_LIBRARY_PATH; however, the resulting model’s inference is dismal.
6.58 FPS with the DLACore enabled, 42.71 FPS without. Also with the DLACore enabled TensorRT seems to be creating an arbitrary GL context preventing business logic from creating one resulting in application crash.
Correct me if I am wrong, but I was under the impression that running mixed precision across the dep learning cores with gpu fallback was expected to increase the performance of inference. (puzzled).
The issue itself (null pointer) was resolved by including the GL libraries in the LD_Library_Path; however, running small-batch real time inference is still painfully slow with mixed precision. Are the DLAs designed for training acceleration only?
Thanks, I will read these resources and see what I can come up with.
Out of curiosity, where does the DLA exist physically? Is it part of the existing GPU/Tensor core arrangement, or does it have it’s own silicon?
Oh, wow. So the 200 and change TOPS rating for the Orin module is derived from the DLA cores? Eg, without optimizing the engines to make use of them a vast portion of the potential performance would remain untapped… Definitely good to know.
One final question, will optimizing my engines to make use of the DLA cores provide meaningful computation advantages when used in a real-time single-batch inference, or are they designed exclusively for multi-batch training applications?
Sorry for the confusion - GPU/Tensor cores offer lot of the AI compute ( 2/3rds on the Jetson Orin AGX SoC) and the two DLA cores offer the rest 1/3. So, in cases where we dont have any DLA core like Nano, all of the AI compute is from GPU.