tensorrt log info about NVDLA

The repo I use is dusty-nv/jetson-inference and I set it default device to DLA.
Below is the log info i got.

nvidia@jetson-0423518027970:~/jetson-inference/build/aarch64/bin$ ./detectnet-console dog_1.jpg output_1.jpg coco-dog
detectnet-console
  args (4):  0 [./detectnet-console]  1 [dog_1.jpg]  2 [output_1.jpg]  3 [coco-dog]  


detectNet -- loading detection network model from:
          -- prototxt     networks/DetectNet-COCO-Dog/deploy.prototxt
          -- model        networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel
          -- input_blob   'data'
          -- output_cvg   'coverage'
          -- output_bbox  'bboxes'
          -- mean_pixel   0.000000
          -- class_labels networks/DetectNet-COCO-Dog/class_labels.txt
          -- threshold    0.500000
          -- batch_size   1

[TRT]  TensorRT version 5.0.3
[TRT]  detected model format - caffe  (extension '.caffemodel')
[TRT]  desired precision specified for DLA_0: FASTEST
[TRT]  requested fasted precision for device DLA_0 without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for DLA_0:  FP32, FP16, INT8
[TRT]  selecting fastest native precision for DLA_0:  FP16
[TRT]  attempting to open engine cache file networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.1.1.DLA_0.FP16.engine
[TRT]  loading network profile from engine cache... networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel.1.1.DLA_0.FP16.engine
[TRT]  device DLA_0, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel loaded
[TRT]  device DLA_0, enabling DLA core 0
[TRT]  device DLA_0, CUDA engine context initialized with 3 bindings
[TRT]  binding -- index   0
               -- name    'data'
               -- type    FP32
               -- in/out  INPUT
               -- # dims  3
               -- dim #0  3 (CHANNEL)
               -- dim #1  640 (SPATIAL)
               -- dim #2  640 (SPATIAL)
[TRT]  binding -- index   1
               -- name    'coverage'
               -- type    FP32
               -- in/out  OUTPUT
               -- # dims  3
               -- dim #0  1 (CHANNEL)
               -- dim #1  40 (SPATIAL)
               -- dim #2  40 (SPATIAL)
[TRT]  binding -- index   2
               -- name    'bboxes'
               -- type    FP32
               -- in/out  OUTPUT
               -- # dims  3
               -- dim #0  4 (CHANNEL)
               -- dim #1  40 (SPATIAL)
               -- dim #2  40 (SPATIAL)
[TRT]  binding to input 0 data  binding index:  0
[TRT]  binding to input 0 data  dims (b=1 c=3 h=640 w=640) size=4915200
[cuda]  cudaAllocMapped 4915200 bytes, CPU 0x21aff9000 GPU 0x21aff9000
[TRT]  binding to output 0 coverage  binding index:  1
[TRT]  binding to output 0 coverage  dims (b=1 c=1 h=40 w=40) size=6400
[cuda]  cudaAllocMapped 6400 bytes, CPU 0x21b4a9000 GPU 0x21b4a9000
[TRT]  binding to output 1 bboxes  binding index:  2
[TRT]  binding to output 1 bboxes  dims (b=1 c=4 h=40 w=40) size=25600
[cuda]  cudaAllocMapped 25600 bytes, CPU 0x21b6a9000 GPU 0x21b6a9000
device DLA_0, networks/DetectNet-COCO-Dog/snapshot_iter_38600.caffemodel initialized.
[cuda]  cudaAllocMapped 16 bytes, CPU 0x216b79200 GPU 0x216b79200
detectNet -- model has 1 object classes
detectNet -- failed to find networks/DetectNet-COCO-Dog/class_labels.txt
detectNet -- maximum bounding boxes:  6400
[cuda]  cudaAllocMapped 102400 bytes, CPU 0x21b8a9000 GPU 0x21b8a9000
[cuda]  cudaAllocMapped 25600 bytes, CPU 0x21b6af400 GPU 0x21b6af400
loaded image  dog_1.jpg  (1920 x 1080)  33177600 bytes
[cuda]  cudaAllocMapped 33177600 bytes, CPU 0x21baa9000 GPU 0x21baa9000
detectnet-console:  beginning processing network (1553242030651)
[TRT]  layer data to nvm - 0.566592 ms
[TRT]  layer {deploy_transform,conv1/7x7_s2,conv1/relu_7x7,pool1/3x3_s2,pool1/norm1,conv2/3x3_reduce,conv2/relu_3x3_reduce,conv2/3x3,conv2/relu_3x3,conv2/norm2,pool2/3x3_s2,inception_3a/1x1,inception_3a/relu_1x1,inception_3a/3x3_reduce,inception_3a/relu_3x3_reduce,inception_3a/3x3,inception_3a/relu_3x3,inception_3a/5x5_reduce,inception_3a/relu_5x5_reduce,inception_3a/5x5,inception_3a/relu_5x5,inception_3a/pool,inception_3a/pool_proj,inception_3a/relu_pool_proj,inception_3a/output,inception_3b/1x1,inception_3b/relu_1x1,inception_3b/3x3_reduce,inception_3b/relu_3x3_reduce,inception_3b/3x3,inception_3b/relu_3x3,inception_3b/5x5_reduce,inception_3b/relu_5x5_reduce,inception_3b/5x5,inception_3b/relu_5x5,inception_3b/pool,inception_3b/pool_proj,inception_3b/relu_pool_proj,inception_3b/output,pool3/3x3_s2,inception_4a/1x1,inception_4a/relu_1x1,inception_4a/3x3_reduce,inception_4a/relu_3x3_reduce,inception_4a/3x3,inception_4a/relu_3x3,inception_4a/5x5_reduce,inception_4a/relu_5x5_reduce,inception_4a/5x5,inception_4a/relu_5x5,inception_4a/pool,inception_4a/pool_proj,inception_4a/relu_pool_proj,inception_4a/output,inception_4b/1x1,inception_4b/relu_1x1,inception_4b/3x3_reduce,inception_4b/relu_3x3_reduce,inception_4b/3x3,inception_4b/relu_3x3,inception_4b/5x5_reduce,inception_4b/relu_5x5_reduce,inception_4b/5x5,inception_4b/relu_5x5,inception_4b/pool,inception_4b/pool_proj,inception_4b/relu_pool_proj,inception_4b/output,inception_4c/1x1,inception_4c/relu_1x1,inception_4c/3x3_reduce,inception_4c/relu_3x3_reduce,inception_4c/3x3,inception_4c/relu_3x3,inception_4c/5x5_reduce,inception_4c/relu_5x5_reduce,inception_4c/5x5,inception_4c/relu_5x5,inception_4c/pool,inception_4c/pool_proj,inception_4c/relu_pool_proj,inception_4c/output,inception_4d/1x1,inception_4d/relu_1x1,inception_4d/3x3_reduce,inception_4d/relu_3x3_reduce,inception_4d/3x3,inception_4d/relu_3x3,inception_4d/5x5_reduce,inception_4d/relu_5x5_reduce,inception_4d/5x5,inception_4d/relu_5x5,inception_4d/pool,inception_4d/pool_proj,inception_4d/relu_pool_proj,inception_4d/output,inception_4e/1x1,inception_4e/relu_1x1,inception_4e/3x3_reduce,inception_4e/relu_3x3_reduce,inception_4e/3x3,inception_4e/relu_3x3,inception_4e/5x5_reduce,inception_4e/relu_5x5_reduce,inception_4e/5x5,inception_4e/relu_5x5,inception_4e/pool,inception_4e/pool_proj,inception_4e/relu_pool_proj,inception_4e/output,inception_5a/1x1,inception_5a/relu_1x1,inception_5a/3x3_reduce,inception_5a/relu_3x3_reduce,inception_5a/3x3,inception_5a/relu_3x3,inception_5a/5x5_reduce,inception_5a/relu_5x5_reduce,inception_5a/5x5,inception_5a/relu_5x5,inception_5a/pool,inception_5a/pool_proj,inception_5a/relu_pool_proj,inception_5a/output,inception_5b/1x1,inception_5b/relu_1x1,inception_5b/3x3_reduce,inception_5b/relu_3x3_reduce,inception_5b/3x3,inception_5b/relu_3x3,inception_5b/5x5_reduce,inception_5b/relu_5x5_reduce,inception_5b/5x5,inception_5b/relu_5x5,inception_5b/pool,inception_5b/pool_proj,inception_5b/relu_pool_proj,inception_5b/output,cvg/classifier,coverage/sig,bbox/regressor} - 1.680768 ms
[TRT]  layer data copy finish - 0.074016 ms
[TRT]  layer bboxes from nvm - 41.842625 ms
[TRT]  layer bboxes copy finish - 0.003488 ms
[TRT]  layer coverage from nvm - 0.006144 ms
[TRT]  layer coverage copy finish - 0.002048 ms
[TRT]  layer network time - 44.175682 ms
detectnet-console:  finished processing network  (1553242030698)
2 bounding boxes detected
detected obj 0  class #0 (class #0)  confidence=0.873047
bounding box 0  (1265.812500, 261.562500)  (1670.812500, 541.792969)  w=405.000000  h=280.230469
detected obj 1  class #0 (class #0)  confidence=0.695312
bounding box 1  (622.687500, 336.155273)  (998.718750, 563.466797)  w=376.031250  h=227.311523
detectnet-console:  writing 1920x1080 image to 'output_1.jpg'
detectnet-console:  successfully wrote 1920x1080 image to 'output_1.jpg'

shutting down...

In the code block above, at line 72, some layers are gathered within the brackets, and I want to know if that means those layers are fused to a subgraph which will run on DLA?

And at line 74 in the code block, it shows this:

[TRT]  layer bboxes from nvm - 41.842625 ms

I want to know what does ‘nvm’ mean and why this process consume so much time?

Hi,

You can get the detail hardware placement with trtexec.

cp -r tensorrt/ .
cd tensorrt/bin/
./trtexec --deploy=[prototxt] --output=[name] --verbose

Thanks.