Detectnet time delay with Digits trained model. Suggestions or it is what it is?

So I’m using a TX2 to work with some object detection software. I"m really just learning right now, and it seems there are much better object detection software out there such as YOLO. However, I want to stick with detectnet for now, as the code is simpler to work with and understand.

I recently followed the instruction here

https://github.com/nvidia/digits/tree/master/examples/object-detection

to create a object detection model for vehicles. Using the supplied images and labels on the page. I processed it using DIGITS on an AWS instance.

Once I got the model, only took 1.5 hrs with 16 GPU’s, I ran a quick test using detectnet and a single image takes almost 1.5min to process. It is held up at

[GIE] building CUDA engine

. I have still frames from video I captured, and I’m running a bash script to insert each image into the Detectnet-Console executable. When I ran this test with the default person detection, each image processed in a few seconds, 3 or 4. Which isn’t fast, but I figured no big deal.

Now that the time for each image is well over a min, it will take hours to do even a small sample size.

Is there any way to pick up the speed on how each image is processed?

Hi mascenzi80, detectnet-console saves the TensorRT bitstream from each network model that it loads. For detectnet it takes a minute the first time it loads a particular model, then the next time it loads it should only take a second.

What happens when you run detectnet-console again with the same model?
(you should see output in the text log about saving/loading the tensorcache)

Each time I run detectnet-console it takes ~1.5 minutes each time using the exact same model, one right after another. Here is the txt log.

What your saying makes sense. It hangs at building the CUDDA engine so it looks like its saving it each and every time.

0 [./detectnet-console]  1 [./output/7/image_2017_12_20_19_47_35.155268.jpg]  2 [./output/7/results/image_2017_12_20_19_47_35.155268.jpg]  3 [--prototxt=networks/vehicle/deploy.prototxt]  4 [--model=networks/vehicle/snapshot_iter_11970.caffemodel]  5 [--input_blob=data]  6 [--output_cvg=coverage]  7 [--output_bbox=bboxes]  

File name ./output/7/image_2017_12_20_19_47_35.155268.jpg 

Time Stamp ID 19_47_35.15526 

detectNet -- loading detection network model from:
          -- prototxt    networks/vehicle/deploy.prototxt
          -- model       networks/vehicle/snapshot_iter_11970.caffemodel
          -- input_blob  'data'
          -- output_cvg  'coverage'
          -- output_bbox 'bboxes'
          -- mean_pixel  0.000000
          -- threshold   0.500000
          -- batch_size  2

[GIE]  TensorRT version 2.1, build 2102
[GIE]  attempting to open cache file networks/vehicle/snapshot_iter_11970.caffemodel.2.tensorcache
[GIE]  cache file not found, profiling network model
[GIE]  platform has FP16 support.
[GIE]  loading networks/vehicle/deploy.prototxt networks/vehicle/snapshot_iter_11970.caffemodel
[GIE]  retrieved output tensor 'coverage'
[GIE]  retrieved output tensor 'bboxes'
[GIE]  configuring CUDA engine
[GIE]  building CUDA engine
[GIE]  completed building CUDA engine
[GIE]  network profiling complete, writing cache to networks/vehicle/snapshot_iter_11970.caffemodel.2.tensorcache
[GIE]  completed writing cache to networks/vehicle/snapshot_iter_11970.caffemodel.2.tensorcache
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel loaded
[GIE]  CUDA engine context initialized with 3 bindings
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel input  binding index:  0
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel input  dims (b=2 c=3 h=384 w=1248) size=11501568
[cuda]  cudaAllocMapped 11501568 bytes, CPU 0x102a00000 GPU 0x102a00000
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel output 0 coverage  binding index:  1
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel output 0 coverage  dims (b=2 c=1 h=24 w=78) size=14976
[cuda]  cudaAllocMapped 14976 bytes, CPU 0x103600000 GPU 0x103600000
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel output 1 bboxes  binding index:  2
[GIE]  networks/vehicle/snapshot_iter_11970.caffemodel output 1 bboxes  dims (b=2 c=4 h=24 w=78) size=59904
[cuda]  cudaAllocMapped 59904 bytes, CPU 0x103800000 GPU 0x103800000
networks/vehicle/snapshot_iter_11970.caffemodel initialized.
[cuda]  cudaAllocMapped 16 bytes, CPU 0x103a00000 GPU 0x103a00000
maximum bounding boxes:  7488
[cuda]  cudaAllocMapped 119808 bytes, CPU 0x103c00000 GPU 0x103c00000
[cuda]  cudaAllocMapped 29952 bytes, CPU 0x10380ea00 GPU 0x10380ea00
loaded image  ./output/7/image_2017_12_20_19_47_35.155268.jpg  (1280 x 720)  14745600 bytes
[cuda]  cudaAllocMapped 14745600 bytes, CPU 0x103e00000 GPU 0x103e00000
detectnet-console:  beginning processing network (1515680421845)
[GIE]  layer deploy_transform input reformatter 0 - 12.334240 ms
[GIE]  layer deploy_transform - 0.610720 ms
[GIE]  layer conv1/7x7_s2 + conv1/relu_7x7 - 12.284480 ms
[GIE]  layer pool1/3x3_s2 - 1.576000 ms
[GIE]  layer pool1/norm1 - 0.451936 ms
[GIE]  layer conv2/3x3_reduce + conv2/relu_3x3_reduce - 0.887680 ms
[GIE]  layer conv2/3x3 + conv2/relu_3x3 - 10.695584 ms
[GIE]  layer conv2/norm2 - 1.293600 ms
[GIE]  layer pool2/3x3_s2 - 1.183040 ms
[GIE]  layer inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce - 1.427776 ms
[GIE]  layer inception_3a/3x3 + inception_3a/relu_3x3 - 2.406400 ms
[GIE]  layer inception_3a/5x5 + inception_3a/relu_5x5 - 0.762880 ms
[GIE]  layer inception_3a/pool - 0.548704 ms
[GIE]  layer inception_3a/pool_proj + inception_3a/relu_pool_proj - 0.502336 ms
[GIE]  layer inception_3a/1x1 copy - 0.088640 ms
[GIE]  layer inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce - 3.002624 ms
[GIE]  layer inception_3b/3x3 + inception_3b/relu_3x3 - 8.457600 ms
[GIE]  layer inception_3b/5x5 + inception_3b/relu_5x5 - 3.920896 ms
[GIE]  layer inception_3b/pool - 0.569600 ms
[GIE]  layer inception_3b/pool_proj + inception_3b/relu_pool_proj - 0.371680 ms
[GIE]  layer inception_3b/1x1 copy - 0.125504 ms
[GIE]  layer pool3/3x3_s2 - 0.504480 ms
[GIE]  layer inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce - 1.642656 ms
[GIE]  layer inception_4a/3x3 + inception_4a/relu_3x3 - 0.680864 ms
[GIE]  layer inception_4a/5x5 + inception_4a/relu_5x5 - 0.160160 ms
[GIE]  layer inception_4a/pool - 0.233056 ms
[GIE]  layer inception_4a/pool_proj + inception_4a/relu_pool_proj - 0.185024 ms
[GIE]  layer inception_4a/1x1 copy - 0.052096 ms
[GIE]  layer inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce - 0.837184 ms
[GIE]  layer inception_4b/3x3 + inception_4b/relu_3x3 - 0.762336 ms
[GIE]  layer inception_4b/5x5 + inception_4b/relu_5x5 - 0.229824 ms
[GIE]  layer inception_4b/pool - 0.248480 ms
[GIE]  layer inception_4b/pool_proj + inception_4b/relu_pool_proj - 0.196416 ms
[GIE]  layer inception_4b/1x1 copy - 0.043104 ms
[GIE]  layer inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce - 0.834656 ms
[GIE]  layer inception_4c/3x3 + inception_4c/relu_3x3 - 0.982400 ms
[GIE]  layer inception_4c/5x5 + inception_4c/relu_5x5 - 0.230624 ms
[GIE]  layer inception_4c/pool - 0.248960 ms
[GIE]  layer inception_4c/pool_proj + inception_4c/relu_pool_proj - 0.196320 ms
[GIE]  layer inception_4c/1x1 copy - 0.034176 ms
[GIE]  layer inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce || inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce - 0.836064 ms
[GIE]  layer inception_4d/3x3 + inception_4d/relu_3x3 - 1.196800 ms
[GIE]  layer inception_4d/5x5 + inception_4d/relu_5x5 - 0.293216 ms
[GIE]  layer inception_4d/pool - 0.248864 ms
[GIE]  layer inception_4d/pool_proj + inception_4d/relu_pool_proj - 0.199040 ms
[GIE]  layer inception_4d/1x1 copy - 0.031296 ms
[GIE]  layer inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce - 1.296384 ms
[GIE]  layer inception_4e/3x3 + inception_4e/relu_3x3 - 1.419680 ms
[GIE]  layer inception_4e/5x5 + inception_4e/relu_5x5 - 0.522336 ms
[GIE]  layer inception_4e/pool - 0.257344 ms
[GIE]  layer inception_4e/pool_proj + inception_4e/relu_pool_proj - 0.358560 ms
[GIE]  layer inception_4e/1x1 copy - 0.067520 ms
[GIE]  layer inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce - 1.980416 ms
[GIE]  layer inception_5a/3x3 + inception_5a/relu_3x3 - 1.429664 ms
[GIE]  layer inception_5a/5x5 + inception_5a/relu_5x5 - 0.525440 ms
[GIE]  layer inception_5a/pool - 0.400160 ms
[GIE]  layer inception_5a/pool_proj + inception_5a/relu_pool_proj - 0.540640 ms
[GIE]  layer inception_5a/1x1 copy - 0.066016 ms
[GIE]  layer inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce - 2.505760 ms
[GIE]  layer inception_5b/3x3 + inception_5b/relu_3x3 - 1.993824 ms
[GIE]  layer inception_5b/5x5 + inception_5b/relu_5x5 - 0.763360 ms
[GIE]  layer inception_5b/pool - 0.401120 ms
[GIE]  layer inception_5b/pool_proj + inception_5b/relu_pool_proj - 0.542400 ms
[GIE]  layer inception_5b/1x1 copy - 0.096480 ms
[GIE]  layer cvg/classifier - 0.358080 ms
[GIE]  layer coverage/sig - 0.013600 ms
[GIE]  layer coverage/sig output reformatter 0 - 0.006816 ms
[GIE]  layer bbox/regressor - 0.352320 ms
[GIE]  layer bbox/regressor output reformatter 0 - 0.009440 ms
[GIE]  layer network time - 90.517365 ms
detectnet-console:  finished processing network  (1515680421947)
3 bounding boxes detected
bounding box 0   (59.326927, 287.094727)  (385.673096, 415.371094)  w=326.346161  h=128.276367
bounding box 1   (270.608978, 300.823975)  (392.243622, 383.437500)  w=121.634644  h=82.613525
bounding box 2   (714.615417, 254.355469)  (1304.743652, 658.769531)  w=590.128235  h=404.414062
draw boxes  3  0   0.000000 200.000000 255.000000 100.000000
detectnet-console:  writing 1280x720 image to './output/7/results/image_2017_12_20_19_47_35.155268.jpg'
detectnet-console:  successfully wrote 1280x720 image to './output/7/results/image_2017_12_20_19_47_35.155268.jpg'

here is my bash script for running detectnet-console.

#!/bin/bash

g++ csvCreate.cpp

for dir in ./output/*
do

        cd $dir
        mkdir results
        cp ../../a.out ./
        ./a.out

        for file in *.jpg
        do
                cd ../..
                NET=networks/vehicle
                ./detectnet-console $dir/$file  $dir/results/$file $dir $file--prototxt=$NET/deploy.prototxt --model=$NET/snapshot_iter_11970.caffemodel --input_blob=data --output_cvg=coverage --output_bbox=bboxes
                cd  $dir
        done

        cd ../..

done

I modified detecnet-console.cpp to ouput bounding box coordinates to a .csv file, so I needed two extra arguments. I saved the required arguments for the detecnet call in a new char* so that the program works as it was originally designed to.

It looks like it’s trying to save the tensorcache to ‘networks/vehicle/snapshot_iter_11970.caffemodel.2.tensorcache’

Can you navigate to that location and see if it’s there, or if the directory has been made read-only for some reason?

To narrow down the issue, can you also try running one of the pre-trained detectnet models like from the tutorial?

https://github.com/dusty-nv/jetson-inference#pretrained-detectnet-models-available

For example, run this command multiple times to see if it is saving/loading tensorcache correctly:

$ ./detectnet-console bottle_0.jpg output_2.jpg coco-bottle

It should take a minute to load the first time, then load quickly in subsequent runs.

dusty_nv,

THANK YOU! I knew there was something going on with the programs ability to update with the new model parameters and use them for each subsequent attempt to analyse another image.

The directory, for some reason, was read only, so you were right on that it wasn’t saving the tensorcache.

This forum has been outside with help and support.

Thank you!

Hi mascenzi80,

Are you planning on implementing/experimenting with YOLO in TensorRT anytime soon?
please let me know if you are interested in collaboration.

Thanks,
Bhargav