Jetpack Vision for Deeplearning issue

DeepLearning is currently using JetPack 4.2.1, mainly using the latest version of TensorRT5.1 (which provides access to the DLA API in Xavier),With JetPack4.2, parsing took 15ms to make an inference. after upgraded to JetPack4.2.1, the Parsing process took 27ms ,more 13ms then previous versions , based on the same input and procedure.
what make this happened?
thanks

Hi,

Do you mean the compile time to convert a model from original frameworks (ex.caffe, tensorflow) into TensorRT engine?

The converting time differs by the algorithm, and the algorithm is automatically chosen by the TensorRT.
With an improved algorithm, it may take longer to compile the TensorRT engine.
But this conversion is one-time job and you should get some performance improvement in inference.

Thanks.

HI, i have a problem,In JetPack 4.2, I test googlenet every layer time :

./trtexec --deploy=/usr/src/tensorrt/data/googlenet/googlenet.prototxt --model=/usr/src/tensorrt/data/googlenet/googlenet.caffemodel --output=prob --batch=10 --int8=true
deploy: /usr/src/tensorrt/data/googlenet/googlenet.prototxt
model: /usr/src/tensorrt/data/googlenet/googlenet.caffemodel
output: prob
batch: 10
int8
Input "data": 3x224x224
Output "prob": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
conv1/7x7_s2 + conv1/relu_7x7 input reformatter 0                                0.090ms
conv1/7x7_s2 + conv1/relu_7x7                                                    0.754ms
pool1/3x3_s2                                                                     0.145ms
pool1/norm1                                                                      0.077ms
conv2/3x3_reduce + conv2/relu_3x3_reduce                                         0.099ms
conv2/3x3 + conv2/relu_3x3                                                       0.494ms
conv2/norm2 input reformatter 0                                                  0.698ms
conv2/norm2                                                                      0.217ms
pool2/3x3_s2                                                                     0.111ms
inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_ 0.139ms
inception_3a/3x3 + inception_3a/relu_3x3                                         0.125ms
inception_3a/5x5 + inception_3a/relu_5x5                                         0.072ms
inception_3a/pool                                                                0.079ms
inception_3a/pool_proj + inception_3a/relu_pool_proj                             0.040ms
inception_3a/1x1 copy                                                            0.016ms
inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_ 0.163ms
inception_3b/3x3 + inception_3b/relu_3x3                                         0.222ms
inception_3b/5x5 + inception_3b/relu_5x5                                         0.108ms
inception_3b/pool                                                                0.162ms
inception_3b/pool_proj + inception_3b/relu_pool_proj                             0.053ms
inception_3b/1x1 copy                                                            0.026ms
pool3/3x3_s2                                                                     0.093ms
inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_ 0.067ms
inception_4a/3x3 + inception_4a/relu_3x3                                         0.061ms
inception_4a/5x5 + inception_4a/relu_5x5                                         0.031ms
inception_4a/5x5 + inception_4a/relu_5x5 output reformatter 0                    0.034ms
inception_4a/pool                                                                0.079ms
inception_4a/pool_proj + inception_4a/relu_pool_proj                             0.028ms
inception_4a/1x1 copy                                                            0.014ms
inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_ 0.073ms
inception_4b/3x3 + inception_4b/relu_3x3                                         0.074ms
inception_4b/5x5 + inception_4b/relu_5x5 input reformatter 0                     0.020ms
inception_4b/5x5 + inception_4b/relu_5x5                                         0.034ms
inception_4b/pool                                                                0.084ms
inception_4b/pool_proj + inception_4b/relu_pool_proj                             0.035ms
inception_4b/1x1 copy                                                            0.012ms
inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_ 0.074ms
inception_4c/3x3 + inception_4c/relu_3x3                                         0.074ms
inception_4c/5x5 + inception_4c/relu_5x5                                         0.037ms
inception_4c/pool                                                                0.084ms
inception_4c/pool_proj + inception_4c/relu_pool_proj                             0.034ms
inception_4c/1x1 copy                                                            0.010ms
inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_ 0.072ms
inception_4d/3x3 + inception_4d/relu_3x3 input reformatter 0                     0.086ms
inception_4d/3x3 + inception_4d/relu_3x3                                         0.138ms
inception_4d/3x3 + inception_4d/relu_3x3 output reformatter 0                    0.067ms
inception_4d/5x5 + inception_4d/relu_5x5                                         0.036ms
inception_4d/5x5 + inception_4d/relu_5x5 output reformatter 0                    0.010ms
inception_4d/pool                                                                0.084ms
inception_4d/pool_proj + inception_4d/relu_pool_proj                             0.036ms
inception_4d/pool_proj + inception_4d/relu_pool_proj output reformatter 0        0.011ms
inception_4d/1x1 copy                                                            0.019ms
inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_ 0.051ms
inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_ 0.104ms
inception_4e/3x3 + inception_4e/relu_3x3                                         0.139ms
inception_4e/5x5 + inception_4e/relu_5x5                                         0.042ms
inception_4e/pool                                                                0.065ms
inception_4e/pool_proj + inception_4e/relu_pool_proj                             0.071ms
inception_4e/1x1 copy                                                            0.016ms
pool4/3x3_s2                                                                     0.045ms
inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_ 0.042ms
inception_5a/3x3 + inception_5a/relu_3x3                                         0.048ms
inception_5a/5x5 + inception_5a/relu_5x5                                         0.025ms
inception_5a/pool                                                                0.038ms
inception_5a/pool_proj + inception_5a/relu_pool_proj                             0.024ms
inception_5a/1x1 copy                                                            0.007ms
inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_ 0.053ms
inception_5b/3x3 + inception_5b/relu_3x3                                         0.061ms
inception_5b/5x5 + inception_5b/relu_5x5                                         0.037ms
inception_5b/pool                                                                0.038ms
inception_5b/pool_proj + inception_5b/relu_pool_proj                             0.037ms
inception_5b/1x1 copy                                                            0.009ms
pool5/7x7_s1                                                                     0.022ms
loss3/classifier input reformatter 0                                             0.008ms
loss3/classifier                                                                 0.068ms
prob                                                                             0.008ms
Time over all layers: 6.558

but In JetPack4.2.1, the Time-consuming increase,As follows:

./trtexec --deploy=/usr/src/tensorrt/data/googlenet/googlenet.prototxt --model=/usr/src/tensorrt/data/googlenet/googlenet.caffemodel --output=prob --batch=10 --int8=true
deploy: /usr/src/tensorrt/data/googlenet/googlenet.prototxt
model: /usr/src/tensorrt/data/googlenet/googlenet.caffemodel
output: prob
batch: 10
int8
Input "data": 3x224x224
Output "prob": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
conv1/7x7_s2 + conv1/relu_7x7 input reformatter 0                                0.358ms
conv1/7x7_s2 + conv1/relu_7x7                                                    0.744ms
pool1/3x3_s2                                                                     0.323ms
pool1/norm1                                                                      0.165ms
conv2/3x3_reduce + conv2/relu_3x3_reduce                                         0.151ms
conv2/3x3 + conv2/relu_3x3                                                       0.926ms
conv2/norm2                                                                      0.422ms
pool2/3x3_s2                                                                     0.266ms
inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_ 0.157ms
inception_3a/3x3 + inception_3a/relu_3x3                                         0.247ms
inception_3a/5x5 + inception_3a/relu_5x5                                         0.117ms
inception_3a/pool                                                                0.244ms
inception_3a/pool_proj + inception_3a/relu_pool_proj                             0.089ms
inception_3a/1x1 copy                                                            0.027ms
inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_ 0.294ms
inception_3b/3x3 + inception_3b/relu_3x3                                         0.427ms
inception_3b/5x5 + inception_3b/relu_5x5                                         0.215ms
inception_3b/pool                                                                0.323ms
inception_3b/pool_proj + inception_3b/relu_pool_proj                             0.077ms
inception_3b/1x1 copy                                                            0.046ms
pool3/3x3_s2                                                                     0.184ms
inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_ 0.123ms
inception_4a/3x3 + inception_4a/relu_3x3                                         0.119ms
inception_4a/5x5 + inception_4a/relu_5x5                                         0.047ms
inception_4a/5x5 + inception_4a/relu_5x5 output reformatter 0                    0.073ms
inception_4a/pool                                                                0.157ms
inception_4a/pool_proj + inception_4a/relu_pool_proj                             0.042ms
inception_4a/1x1 copy                                                            0.025ms
inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_ 0.124ms
inception_4b/3x3 + inception_4b/relu_3x3                                         0.144ms
inception_4b/5x5 + inception_4b/relu_5x5 input reformatter 0                     0.041ms
inception_4b/5x5 + inception_4b/relu_5x5                                         0.048ms
inception_4b/pool                                                                0.166ms
inception_4b/pool_proj + inception_4b/relu_pool_proj                             0.062ms
inception_4b/1x1 copy                                                            0.022ms
inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_ 0.144ms
inception_4c/3x3 + inception_4c/relu_3x3                                         0.164ms
inception_4c/5x5 + inception_4c/relu_5x5                                         0.043ms
inception_4c/pool                                                                0.167ms
inception_4c/pool_proj + inception_4c/relu_pool_proj                             0.067ms
inception_4c/1x1 copy                                                            0.018ms
inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_ 0.141ms
inception_4d/3x3 + inception_4d/relu_3x3 input reformatter 0                     0.174ms
inception_4d/3x3 + inception_4d/relu_3x3                                         0.252ms
inception_4d/3x3 + inception_4d/relu_3x3 output reformatter 0                    0.139ms
inception_4d/5x5 + inception_4d/relu_5x5                                         0.042ms
inception_4d/5x5 + inception_4d/relu_5x5 output reformatter 0                    0.020ms
inception_4d/pool                                                                0.167ms
inception_4d/pool_proj + inception_4d/relu_pool_proj                             0.066ms
inception_4d/pool_proj + inception_4d/relu_pool_proj output reformatter 0        0.020ms
inception_4d/1x1 copy                                                            0.036ms
inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_ 0.101ms
inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_ 0.191ms
inception_4e/3x3 + inception_4e/relu_3x3                                         0.257ms
inception_4e/5x5 + inception_4e/relu_5x5                                         0.066ms
inception_4e/pool                                                                0.133ms
inception_4e/pool_proj + inception_4e/relu_pool_proj                             0.138ms
inception_4e/1x1 copy                                                            0.029ms
pool4/3x3_s2                                                                     0.087ms
inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_ 0.063ms
inception_5a/3x3 + inception_5a/relu_3x3                                         0.093ms
inception_5a/5x5 + inception_5a/relu_5x5                                         0.042ms
inception_5a/pool                                                                0.074ms
inception_5a/pool_proj + inception_5a/relu_pool_proj                             0.042ms
inception_5a/1x1 copy                                                            0.013ms
inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_ 0.097ms
inception_5b/3x3 + inception_5b/relu_3x3                                         0.125ms
inception_5b/5x5 + inception_5b/relu_5x5                                         0.067ms
inception_5b/pool                                                                0.076ms
inception_5b/pool_proj + inception_5b/relu_pool_proj                             0.043ms
inception_5b/1x1 copy                                                            0.016ms
pool5/7x7_s1                                                                     0.038ms
loss3/classifier input reformatter 0                                             0.015ms
loss3/classifier                                                                 0.140ms
prob                                                                             0.016ms
Time over all layers: 10.618

Q:What is the cause of this?I just upgraded JetPack

Hi,

Thanks for reporting this.
This looks strange to me. We will try to reproduce this and update more information with you later.

Hi,

We cannot reproduce this issue in our environment.
The performance is improved from JetPack4.2 to JetPack4.2.1.

[JetPack-4.2/r32.1]
Average over 10 runs is 6.47702 ms (host walltime is 6.57044 ms, 99% percentile time is 6.57357).

[JetPack-4.2.1/r32.2]
Average over 10 runs is 5.46243 ms (host walltime is 5.54187 ms, 99% percentile time is 5.54166).

Have you maximized the device performance before profiling?
The profiling results may not be optimal if the device is not locked in the maximal clocks.

sudo nvpmodel -m 0
sudo jetson_clocks

Thanks.

thanks , Solve my problem。but i have question: i just use command

sudo jetson_clocks

the profiling result unusual but if

sudo nvpmodel -m 0

sudo jetson_clocks

the profiling result is OK!

Q:nvpmodel ,What is its role?

Hi,

nvpmodel help to configure the device into difference usecase, ex. low-power, performance, …
Mode ID=0 is the performance mode.

You can find the detail of each mode in our document:
https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-321/index.html#page/Tegra%2520Linux%2520Driver%2520Package%2520Development%2520Guide%2Fpower_management_jetson_xavier.html%23wwpID0E0EM0HA

Thanks.