The repo I use is dusty-nv/jetson-inference and I set it default device to DLA.
Below is the log info i got.
nvidia@jetson-0423518027970:~/jetson-inference/build/aarch64/bin$ ./detectnet-console dog_1.jpg output_1.jpg coco-dog
detectnet-console
args (4): 0 [./detectnet-console] 1 [dog_1.jpg] 2 [output_1.jpg] 3 [coco-dog]
detectNet -- loading detection network model from:
-- prototxt networks/DetectNet-COCO-Dog/deploy.prototxt
-- model networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel
-- input_blob 'data'
-- output_cvg 'coverage'
-- output_bbox 'bboxes'
-- mean_pixel 0.000000
-- class_labels networks/DetectNet-COCO-Dog/class_labels.txt
-- threshold 0.500000
-- batch_size 1
[TRT] TensorRT version 5.0.3
[TRT] detected model format - caffe (extension '.caffemodel')
[TRT] desired precision specified for DLA_0: FASTEST
[TRT] requested fasted precision for device DLA_0 without providing valid calibrator, disabling INT8
[TRT] native precisions detected for DLA_0: FP32, FP16, INT8
[TRT] selecting fastest native precision for DLA_0: FP16
[TRT] attempting to open engine cache file networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.1.1.DLA_0.FP16.engine
[TRT] loading network profile from engine cache... networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.1.1.DLA_0.FP16.engine
[TRT] device DLA_0, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel loaded
[TRT] device DLA_0, enabling DLA core 0
[TRT] device DLA_0, CUDA engine context initialized with 3 bindings
[TRT] binding -- index 0
-- name 'data'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (CHANNEL)
-- dim #1 640 (SPATIAL)
-- dim #2 640 (SPATIAL)
[TRT] binding -- index 1
-- name 'coverage'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 1 (CHANNEL)
-- dim #1 40 (SPATIAL)
-- dim #2 40 (SPATIAL)
[TRT] binding -- index 2
-- name 'bboxes'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 4 (CHANNEL)
-- dim #1 40 (SPATIAL)
-- dim #2 40 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=640 w=640) size=4915200
[cuda] cudaAllocMapped 4915200 bytes, CPU 0x21aff9000 GPU 0x21aff9000
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=1 c=1 h=40 w=40) size=6400
[cuda] cudaAllocMapped 6400 bytes, CPU 0x21b4a9000 GPU 0x21b4a9000
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=1 c=4 h=40 w=40) size=25600
[cuda] cudaAllocMapped 25600 bytes, CPU 0x21b6a9000 GPU 0x21b6a9000
device DLA_0, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel initialized.
[cuda] cudaAllocMapped 16 bytes, CPU 0x216b79200 GPU 0x216b79200
detectNet -- model has 1 object classes
detectNet -- failed to find networks/DetectNet-COCO-Dog/class_labels.txt
detectNet -- maximum bounding boxes: 6400
[cuda] cudaAllocMapped 102400 bytes, CPU 0x21b8a9000 GPU 0x21b8a9000
[cuda] cudaAllocMapped 25600 bytes, CPU 0x21b6af400 GPU 0x21b6af400
loaded image dog_1.jpg (1920 x 1080) 33177600 bytes
[cuda] cudaAllocMapped 33177600 bytes, CPU 0x21baa9000 GPU 0x21baa9000
detectnet-console: beginning processing network (1553242030651)
[TRT] layer data to nvm - 0.566592 ms
[TRT] layer {deploy_transform,conv1/7x7_s2,conv1/relu_7x7,pool1/3x3_s2,pool1/norm1,conv2/3x3_reduce,conv2/relu_3x3_reduce,conv2/3x3,conv2/relu_3x3,conv2/norm2,pool2/3x3_s2,inception_3a/1x1,inception_3a/relu_1x1,inception_3a/3x3_reduce,inception_3a/relu_3x3_reduce,inception_3a/3x3,inception_3a/relu_3x3,inception_3a/5x5_reduce,inception_3a/relu_5x5_reduce,inception_3a/5x5,inception_3a/relu_5x5,inception_3a/pool,inception_3a/pool_proj,inception_3a/relu_pool_proj,inception_3a/output,inception_3b/1x1,inception_3b/relu_1x1,inception_3b/3x3_reduce,inception_3b/relu_3x3_reduce,inception_3b/3x3,inception_3b/relu_3x3,inception_3b/5x5_reduce,inception_3b/relu_5x5_reduce,inception_3b/5x5,inception_3b/relu_5x5,inception_3b/pool,inception_3b/pool_proj,inception_3b/relu_pool_proj,inception_3b/output,pool3/3x3_s2,inception_4a/1x1,inception_4a/relu_1x1,inception_4a/3x3_reduce,inception_4a/relu_3x3_reduce,inception_4a/3x3,inception_4a/relu_3x3,inception_4a/5x5_reduce,inception_4a/relu_5x5_reduce,inception_4a/5x5,inception_4a/relu_5x5,inception_4a/pool,inception_4a/pool_proj,inception_4a/relu_pool_proj,inception_4a/output,inception_4b/1x1,inception_4b/relu_1x1,inception_4b/3x3_reduce,inception_4b/relu_3x3_reduce,inception_4b/3x3,inception_4b/relu_3x3,inception_4b/5x5_reduce,inception_4b/relu_5x5_reduce,inception_4b/5x5,inception_4b/relu_5x5,inception_4b/pool,inception_4b/pool_proj,inception_4b/relu_pool_proj,inception_4b/output,inception_4c/1x1,inception_4c/relu_1x1,inception_4c/3x3_reduce,inception_4c/relu_3x3_reduce,inception_4c/3x3,inception_4c/relu_3x3,inception_4c/5x5_reduce,inception_4c/relu_5x5_reduce,inception_4c/5x5,inception_4c/relu_5x5,inception_4c/pool,inception_4c/pool_proj,inception_4c/relu_pool_proj,inception_4c/output,inception_4d/1x1,inception_4d/relu_1x1,inception_4d/3x3_reduce,inception_4d/relu_3x3_reduce,inception_4d/3x3,inception_4d/relu_3x3,inception_4d/5x5_reduce,inception_4d/relu_5x5_reduce,inception_4d/5x5,inception_4d/relu_5x5,inception_4d/pool,inception_4d/pool_proj,inception_4d/relu_pool_proj,inception_4d/output,inception_4e/1x1,inception_4e/relu_1x1,inception_4e/3x3_reduce,inception_4e/relu_3x3_reduce,inception_4e/3x3,inception_4e/relu_3x3,inception_4e/5x5_reduce,inception_4e/relu_5x5_reduce,inception_4e/5x5,inception_4e/relu_5x5,inception_4e/pool,inception_4e/pool_proj,inception_4e/relu_pool_proj,inception_4e/output,inception_5a/1x1,inception_5a/relu_1x1,inception_5a/3x3_reduce,inception_5a/relu_3x3_reduce,inception_5a/3x3,inception_5a/relu_3x3,inception_5a/5x5_reduce,inception_5a/relu_5x5_reduce,inception_5a/5x5,inception_5a/relu_5x5,inception_5a/pool,inception_5a/pool_proj,inception_5a/relu_pool_proj,inception_5a/output,inception_5b/1x1,inception_5b/relu_1x1,inception_5b/3x3_reduce,inception_5b/relu_3x3_reduce,inception_5b/3x3,inception_5b/relu_3x3,inception_5b/5x5_reduce,inception_5b/relu_5x5_reduce,inception_5b/5x5,inception_5b/relu_5x5,inception_5b/pool,inception_5b/pool_proj,inception_5b/relu_pool_proj,inception_5b/output,cvg/classifier,coverage/sig,bbox/regressor} - 1.680768 ms
[TRT] layer data copy finish - 0.074016 ms
[TRT] layer bboxes from nvm - 41.842625 ms
[TRT] layer bboxes copy finish - 0.003488 ms
[TRT] layer coverage from nvm - 0.006144 ms
[TRT] layer coverage copy finish - 0.002048 ms
[TRT] layer network time - 44.175682 ms
detectnet-console: finished processing network (1553242030698)
2 bounding boxes detected
detected obj 0 class #0 (class #0) confidence=0.873047
bounding box 0 (1265.812500, 261.562500) (1670.812500, 541.792969) w=405.000000 h=280.230469
detected obj 1 class #0 (class #0) confidence=0.695312
bounding box 1 (622.687500, 336.155273) (998.718750, 563.466797) w=376.031250 h=227.311523
detectnet-console: writing 1920x1080 image to 'output_1.jpg'
detectnet-console: successfully wrote 1920x1080 image to 'output_1.jpg'
shutting down...
In the code block above, at line 72, some layers are gathered within the brackets, and I want to know if that means those layers are fused to a subgraph which will run on DLA?
And at line 74 in the code block, it shows this:
[TRT] layer bboxes from nvm - 41.842625 ms
I want to know what does ‘nvm’ mean and why this process consume so much time?