I have two Jetson AGX Xavier boards with different Jetpack versions.
One is Jetpack 4.3 (with TensorRT 6.0.1) and the other is Jetpack 4.4 (with TensorRT 7.1.3).
I will call these two devices as Device A and Device B.
Device A: Jetson AGX Xavier with Jetpack 4.3
Device B: Jetson AGX Xavier with Jetpack 4.4
I ran the same code on DLA of both devices, but I found some differences.
I set MAXN nvp model and Jetson_clocks on both devices
1. Inference time with Device B is longer than Device A.
When I ran my code on Device A, it takes 9.4 seconds, but in Device B it took 11.9 seconds.
I don’t know why Device B took more time.
2. Device B has a limitation on creating multiple contexts
I created multiple contexts like the following example codes.
for(j = 0 ; j < NUM_CONTEXTS ; j++)
context[j] = engine->createExecutionContext();
assert(context[j]!= nullptr );
For Device A, I can create multiple execution contexts, but Device B occurs an error when the certain amount of contexts are created.
For this reason, I can create 8 contexts with Device A, but I can only create 4 contexts in Device B.
Error in Device B looks like the below.
NvMapMemAllocInternalTagged: 1074810371 error 12
NvMapMemHandleAlloc: error 12
NVMEDIA_DLA : 1686, ERROR: runtime loadBare failed. err: 0x6.
…/rtExt/dla/native/dlaUtils.cpp (166) - DLA Error in deserialize: 7 (NvMediaDlaLoadLoadable : load loadable failed.)
Another difference is that createExecutionContext() call in Device B takes longer time than createExecutionContext() in Device A. In Device B, it takes a second to create each context, but Device A immediately creates multiple execution contexts.
Do you know why these issues are happened in Jetpack 4.4 with TensorRT 7.1.3?