I cloned the repo GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. for object detection on Jetson Xavier.
The network I used is DetectNet.
The engine is run on GPU by default, so I wanted to try run inference on DLA.
I modified some default parameters for function DetectNet::create() in the file named detectNet.h:
2 to 1 for “maxBatchSize” and DEVICE_GPU to DEVICE_DLA_0 for “device”.
But it seemed that the speed become slower.
for DLA:[TRT] layer network time - 44.175682 ms
for GPU:[TRT] layer network time - 10.079423 ms
And blow is the detailed information:
GPU:
nvidia@jetson-0423518027970:~/jetson-inference/build/aarch64/bin$ ./detectnet-console dog_0.jpg output_0.jpg coco-dog
detectnet-console
args (4): 0 [./detectnet-console] 1 [dog_0.jpg] 2 [output_0.jpg] 3 [coco-dog]
detectNet -- loading detection network model from:
-- prototxt networks/DetectNet-COCO-Dog/deploy.prototxt
-- model networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel
-- input_blob 'data'
-- output_cvg 'coverage'
-- output_bbox 'bboxes'
-- mean_pixel 0.000000
-- class_labels networks/DetectNet-COCO-Dog/class_labels.txt
-- threshold 0.500000
-- batch_size 2
[TRT] TensorRT version 5.0.3
[TRT] detected model format - caffe (extension '.caffemodel')
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16, INT8
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.2.1.GPU.FP16.engine
[TRT] loading network profile from engine cache... networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.2.1.GPU.FP16.engine
[TRT] device GPU, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding -- index 0
-- name 'data'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (CHANNEL)
-- dim #1 640 (SPATIAL)
-- dim #2 640 (SPATIAL)
[TRT] binding -- index 1
-- name 'coverage'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 1 (CHANNEL)
-- dim #1 40 (SPATIAL)
-- dim #2 40 (SPATIAL)
[TRT] binding -- index 2
-- name 'bboxes'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 4 (CHANNEL)
-- dim #1 40 (SPATIAL)
-- dim #2 40 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=2 c=3 h=640 w=640) size=9830400
[cuda] cudaAllocMapped 9830400 bytes, CPU 0x21e7f6000 GPU 0x21e7f6000
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=2 c=1 h=40 w=40) size=12800
[cuda] cudaAllocMapped 12800 bytes, CPU 0x21f156000 GPU 0x21f156000
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=2 c=4 h=40 w=40) size=51200
[cuda] cudaAllocMapped 51200 bytes, CPU 0x21f356000 GPU 0x21f356000
device GPU, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel initialized.
[cuda] cudaAllocMapped 16 bytes, CPU 0x2177b6200 GPU 0x2177b6200
detectNet -- model has 1 object classes
detectNet -- failed to find networks/DetectNet-COCO-Dog/class_labels.txt
detectNet -- maximum bounding boxes: 6400
[cuda] cudaAllocMapped 102400 bytes, CPU 0x21f556000 GPU 0x21f556000
[cuda] cudaAllocMapped 25600 bytes, CPU 0x21f362800 GPU 0x21f362800
loaded image dog_0.jpg (2049 x 1120) 36718080 bytes
[cuda] cudaAllocMapped 36718080 bytes, CPU 0x21f756000 GPU 0x21f756000
detectnet-console: beginning processing network (1553239072906)
[TRT] layer deploy_transform - 0.175968 ms
[TRT] layer conv1/7x7_s2 + conv1/relu_7x7 input reformatter 0 - 0.141504 ms
[TRT] layer conv1/7x7_s2 + conv1/relu_7x7 - 0.898880 ms
[TRT] layer pool1/3x3_s2 - 0.170240 ms
[TRT] layer pool1/norm1 input reformatter 0 - 0.071424 ms
[TRT] layer pool1/norm1 - 0.127328 ms
[TRT] layer conv2/3x3_reduce + conv2/relu_3x3_reduce input reformatter 0 - 0.082592 ms
[TRT] layer conv2/3x3_reduce + conv2/relu_3x3_reduce - 0.105472 ms
[TRT] layer conv2/3x3 + conv2/relu_3x3 - 0.948544 ms
[TRT] layer conv2/norm2 input reformatter 0 - 0.191616 ms
[TRT] layer conv2/norm2 - 0.349952 ms
[TRT] layer pool2/3x3_s2 input reformatter 0 - 0.227488 ms
[TRT] layer pool2/3x3_s2 - 0.134816 ms
[TRT] layer inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce - 0.131072 ms
[TRT] layer inception_3a/3x3 + inception_3a/relu_3x3 - 0.234880 ms
[TRT] layer inception_3a/5x5 + inception_3a/relu_5x5 - 0.116352 ms
[TRT] layer inception_3a/pool - 0.091136 ms
[TRT] layer inception_3a/pool_proj + inception_3a/relu_pool_proj - 0.048224 ms
[TRT] layer inception_3a/1x1 copy - 0.043936 ms
[TRT] layer inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce - 0.246176 ms
[TRT] layer inception_3b/3x3 + inception_3b/relu_3x3 - 0.456288 ms
[TRT] layer inception_3b/5x5 + inception_3b/relu_5x5 - 0.196928 ms
[TRT] layer inception_3b/pool - 0.124672 ms
[TRT] layer inception_3b/pool_proj + inception_3b/relu_pool_proj - 0.081856 ms
[TRT] layer inception_3b/1x1 copy - 0.038912 ms
[TRT] layer pool3/3x3_s2 - 0.457728 ms
[TRT] layer inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce - 0.115712 ms
[TRT] layer inception_4a/3x3 + inception_4a/relu_3x3 - 0.115200 ms
[TRT] layer inception_4a/5x5 + inception_4a/relu_5x5 - 0.045920 ms
[TRT] layer inception_4a/pool - 0.069408 ms
[TRT] layer inception_4a/pool_proj + inception_4a/relu_pool_proj - 0.038784 ms
[TRT] layer inception_4a/1x1 copy - 0.016640 ms
[TRT] layer inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce - 0.113408 ms
[TRT] layer inception_4b/3x3 + inception_4b/relu_3x3 - 0.146432 ms
[TRT] layer inception_4b/5x5 + inception_4b/relu_5x5 - 0.039936 ms
[TRT] layer inception_4b/pool - 0.072032 ms
[TRT] layer inception_4b/pool_proj + inception_4b/relu_pool_proj - 0.048000 ms
[TRT] layer inception_4b/1x1 copy - 0.015392 ms
[TRT] layer inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce - 0.116544 ms
[TRT] layer inception_4c/3x3 + inception_4c/relu_3x3 - 0.146688 ms
[TRT] layer inception_4c/5x5 + inception_4c/relu_5x5 - 0.044736 ms
[TRT] layer inception_4c/pool - 0.073728 ms
[TRT] layer inception_4c/pool_proj + inception_4c/relu_pool_proj - 0.044576 ms
[TRT] layer inception_4c/1x1 copy - 0.013792 ms
[TRT] layer inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce || inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce - 0.114944 ms
[TRT] layer inception_4d/3x3 + inception_4d/relu_3x3 - 0.254720 ms
[TRT] layer inception_4d/5x5 + inception_4d/relu_5x5 - 0.043008 ms
[TRT] layer inception_4d/pool - 0.073728 ms
[TRT] layer inception_4d/pool_proj + inception_4d/relu_pool_proj - 0.045184 ms
[TRT] layer inception_4d/1x1 copy - 0.013440 ms
[TRT] layer inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce - 0.162784 ms
[TRT] layer inception_4e/3x3 + inception_4e/relu_3x3 - 0.246560 ms
[TRT] layer inception_4e/5x5 + inception_4e/relu_5x5 - 0.053664 ms
[TRT] layer inception_4e/pool - 0.076832 ms
[TRT] layer inception_4e/pool_proj + inception_4e/relu_pool_proj - 0.045152 ms
[TRT] layer inception_4e/1x1 copy - 0.020032 ms
[TRT] layer inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce - 0.227264 ms
[TRT] layer inception_5a/3x3 + inception_5a/relu_3x3 - 0.246752 ms
[TRT] layer inception_5a/5x5 + inception_5a/relu_5x5 - 0.053600 ms
[TRT] layer inception_5a/pool - 0.115392 ms
[TRT] layer inception_5a/pool_proj + inception_5a/relu_pool_proj - 0.064800 ms
[TRT] layer inception_5a/1x1 copy - 0.019360 ms
[TRT] layer inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce - 0.290592 ms
[TRT] layer inception_5b/3x3 + inception_5b/relu_3x3 - 0.302112 ms
[TRT] layer inception_5b/5x5 + inception_5b/relu_5x5 - 0.099296 ms
[TRT] layer inception_5b/pool - 0.115200 ms
[TRT] layer inception_5b/pool_proj + inception_5b/relu_pool_proj - 0.063616 ms
[TRT] layer inception_5b/1x1 copy - 0.028064 ms
[TRT] layer cvg/classifier - 0.049120 ms
[TRT] layer coverage/sig input reformatter 0 - 0.004480 ms
[TRT] layer coverage/sig - 0.006048 ms
[TRT] layer bbox/regressor - 0.068384 ms
[TRT] layer bbox/regressor output reformatter 0 - 0.004384 ms
[TRT] layer network time - 10.079423 ms
detectnet-console: finished processing network (1553239072920)
5 bounding boxes detected
detected obj 0 class #0 (class #0) confidence=0.992481
bounding box 0 (437.663574, 257.960938) (822.801514, 498.476562) w=385.137939 h=240.515625
detected obj 1 class #0 (class #0) confidence=0.787441
bounding box 1 (1565.563965, 317.078125) (2028.589966, 875.656250) w=463.026001 h=558.578125
detected obj 2 class #0 (class #0) confidence=0.887009
bounding box 2 (975.426025, 372.353516) (1202.887085, 710.609375) w=227.461060 h=338.255859
detected obj 3 class #0 (class #0) confidence=0.660282
bounding box 3 (52.094173, 381.800781) (191.443420, 567.218750) w=139.349243 h=185.417969
detected obj 4 class #0 (class #0) confidence=0.970521
bounding box 4 (216.730774, 393.080078) (399.695068, 510.371094) w=182.964294 h=117.291016
detectnet-console: writing 2049x1120 image to 'output_0.jpg'
detectnet-console: successfully wrote 2049x1120 image to 'output_0.jpg'
shutting down...
DLA:
nvidia@jetson-0423518027970:~/jetson-inference/build/aarch64/bin$ ./detectnet-console dog_1.jpg output_1.jpg coco-dog
detectnet-console
args (4): 0 [./detectnet-console] 1 [dog_1.jpg] 2 [output_1.jpg] 3 [coco-dog]
detectNet -- loading detection network model from:
-- prototxt networks/DetectNet-COCO-Dog/deploy.prototxt
-- model networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel
-- input_blob 'data'
-- output_cvg 'coverage'
-- output_bbox 'bboxes'
-- mean_pixel 0.000000
-- class_labels networks/DetectNet-COCO-Dog/class_labels.txt
-- threshold 0.500000
-- batch_size 1
[TRT] TensorRT version 5.0.3
[TRT] detected model format - caffe (extension '.caffemodel')
[TRT] desired precision specified for DLA_0: FASTEST
[TRT] requested fasted precision for device DLA_0 without providing valid calibrator, disabling INT8
[TRT] native precisions detected for DLA_0: FP32, FP16, INT8
[TRT] selecting fastest native precision for DLA_0: FP16
[TRT] attempting to open engine cache file networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.1.1.DLA_0.FP16.engine
[TRT] loading network profile from engine cache... networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.1.1.DLA_0.FP16.engine
[TRT] device DLA_0, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel loaded
[TRT] device DLA_0, enabling DLA core 0
[TRT] device DLA_0, CUDA engine context initialized with 3 bindings
[TRT] binding -- index 0
-- name 'data'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (CHANNEL)
-- dim #1 640 (SPATIAL)
-- dim #2 640 (SPATIAL)
[TRT] binding -- index 1
-- name 'coverage'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 1 (CHANNEL)
-- dim #1 40 (SPATIAL)
-- dim #2 40 (SPATIAL)
[TRT] binding -- index 2
-- name 'bboxes'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 4 (CHANNEL)
-- dim #1 40 (SPATIAL)
-- dim #2 40 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=640 w=640) size=4915200
[cuda] cudaAllocMapped 4915200 bytes, CPU 0x21aff9000 GPU 0x21aff9000
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=1 c=1 h=40 w=40) size=6400
[cuda] cudaAllocMapped 6400 bytes, CPU 0x21b4a9000 GPU 0x21b4a9000
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=1 c=4 h=40 w=40) size=25600
[cuda] cudaAllocMapped 25600 bytes, CPU 0x21b6a9000 GPU 0x21b6a9000
device DLA_0, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel initialized.
[cuda] cudaAllocMapped 16 bytes, CPU 0x216b79200 GPU 0x216b79200
detectNet -- model has 1 object classes
detectNet -- failed to find networks/DetectNet-COCO-Dog/class_labels.txt
detectNet -- maximum bounding boxes: 6400
[cuda] cudaAllocMapped 102400 bytes, CPU 0x21b8a9000 GPU 0x21b8a9000
[cuda] cudaAllocMapped 25600 bytes, CPU 0x21b6af400 GPU 0x21b6af400
loaded image dog_1.jpg (1920 x 1080) 33177600 bytes
[cuda] cudaAllocMapped 33177600 bytes, CPU 0x21baa9000 GPU 0x21baa9000
detectnet-console: beginning processing network (1553242030651)
[TRT] layer data to nvm - 0.566592 ms
[TRT] layer {deploy_transform,conv1/7x7_s2,conv1/relu_7x7,pool1/3x3_s2,pool1/norm1,conv2/3x3_reduce,conv2/relu_3x3_reduce,conv2/3x3,conv2/relu_3x3,conv2/norm2,pool2/3x3_s2,inception_3a/1x1,inception_3a/relu_1x1,inception_3a/3x3_reduce,inception_3a/relu_3x3_reduce,inception_3a/3x3,inception_3a/relu_3x3,inception_3a/5x5_reduce,inception_3a/relu_5x5_reduce,inception_3a/5x5,inception_3a/relu_5x5,inception_3a/pool,inception_3a/pool_proj,inception_3a/relu_pool_proj,inception_3a/output,inception_3b/1x1,inception_3b/relu_1x1,inception_3b/3x3_reduce,inception_3b/relu_3x3_reduce,inception_3b/3x3,inception_3b/relu_3x3,inception_3b/5x5_reduce,inception_3b/relu_5x5_reduce,inception_3b/5x5,inception_3b/relu_5x5,inception_3b/pool,inception_3b/pool_proj,inception_3b/relu_pool_proj,inception_3b/output,pool3/3x3_s2,inception_4a/1x1,inception_4a/relu_1x1,inception_4a/3x3_reduce,inception_4a/relu_3x3_reduce,inception_4a/3x3,inception_4a/relu_3x3,inception_4a/5x5_reduce,inception_4a/relu_5x5_reduce,inception_4a/5x5,inception_4a/relu_5x5,inception_4a/pool,inception_4a/pool_proj,inception_4a/relu_pool_proj,inception_4a/output,inception_4b/1x1,inception_4b/relu_1x1,inception_4b/3x3_reduce,inception_4b/relu_3x3_reduce,inception_4b/3x3,inception_4b/relu_3x3,inception_4b/5x5_reduce,inception_4b/relu_5x5_reduce,inception_4b/5x5,inception_4b/relu_5x5,inception_4b/pool,inception_4b/pool_proj,inception_4b/relu_pool_proj,inception_4b/output,inception_4c/1x1,inception_4c/relu_1x1,inception_4c/3x3_reduce,inception_4c/relu_3x3_reduce,inception_4c/3x3,inception_4c/relu_3x3,inception_4c/5x5_reduce,inception_4c/relu_5x5_reduce,inception_4c/5x5,inception_4c/relu_5x5,inception_4c/pool,inception_4c/pool_proj,inception_4c/relu_pool_proj,inception_4c/output,inception_4d/1x1,inception_4d/relu_1x1,inception_4d/3x3_reduce,inception_4d/relu_3x3_reduce,inception_4d/3x3,inception_4d/relu_3x3,inception_4d/5x5_reduce,inception_4d/relu_5x5_reduce,inception_4d/5x5,inception_4d/relu_5x5,inception_4d/pool,inception_4d/pool_proj,inception_4d/relu_pool_proj,inception_4d/output,inception_4e/1x1,inception_4e/relu_1x1,inception_4e/3x3_reduce,inception_4e/relu_3x3_reduce,inception_4e/3x3,inception_4e/relu_3x3,inception_4e/5x5_reduce,inception_4e/relu_5x5_reduce,inception_4e/5x5,inception_4e/relu_5x5,inception_4e/pool,inception_4e/pool_proj,inception_4e/relu_pool_proj,inception_4e/output,inception_5a/1x1,inception_5a/relu_1x1,inception_5a/3x3_reduce,inception_5a/relu_3x3_reduce,inception_5a/3x3,inception_5a/relu_3x3,inception_5a/5x5_reduce,inception_5a/relu_5x5_reduce,inception_5a/5x5,inception_5a/relu_5x5,inception_5a/pool,inception_5a/pool_proj,inception_5a/relu_pool_proj,inception_5a/output,inception_5b/1x1,inception_5b/relu_1x1,inception_5b/3x3_reduce,inception_5b/relu_3x3_reduce,inception_5b/3x3,inception_5b/relu_3x3,inception_5b/5x5_reduce,inception_5b/relu_5x5_reduce,inception_5b/5x5,inception_5b/relu_5x5,inception_5b/pool,inception_5b/pool_proj,inception_5b/relu_pool_proj,inception_5b/output,cvg/classifier,coverage/sig,bbox/regressor} - 1.680768 ms
[TRT] layer data copy finish - 0.074016 ms
[TRT] layer bboxes from nvm - 41.842625 ms
[TRT] layer bboxes copy finish - 0.003488 ms
[TRT] layer coverage from nvm - 0.006144 ms
[TRT] layer coverage copy finish - 0.002048 ms
[TRT] layer network time - 44.175682 ms
detectnet-console: finished processing network (1553242030698)
2 bounding boxes detected
detected obj 0 class #0 (class #0) confidence=0.873047
bounding box 0 (1265.812500, 261.562500) (1670.812500, 541.792969) w=405.000000 h=280.230469
detected obj 1 class #0 (class #0) confidence=0.695312
bounding box 1 (622.687500, 336.155273) (998.718750, 563.466797) w=376.031250 h=227.311523
detectnet-console: writing 1920x1080 image to 'output_1.jpg'
detectnet-console: successfully wrote 1920x1080 image to 'output_1.jpg'
shutting down...