re : jetson nano start kit : hello world ai : detectnet failure

Hi

disclaimer : I am utterly new rookie on this wonderful technology / science and inspired by this following the availability of the hardware (nano) and the dusty-nv (software) basic tutorials. At least I am reasonable fluent with Ubuntu as an OS and c++ (QT).

  1. The entire setup is pretty easy following a lot that is already prepacked in the sd card image . So I downloaded and flashed a 32G sd card image with the 32.2 version of ubuntu/jetpack (saturday 21/07)

  2. followed the setup meticulously and loaded only the default networks and did not download pytorch (yet) . I believe pytorch is only for the next step of transfer learning. I cloned the repo , cmake’d it , make and make install .

  3. I attached a rpi v 2.1 camera to the system.

  4. Sooner than later I was able to run the imagenet-console and imagenet-camera test which passed flying colors . so by this I infer that a lot is working on my system .

  5. However detectnet-console and detectnet-camera fails dismally . After many attempts , I decided to re-flash the SD card again and build everything from first principles again in the hope that I have missed some steps (that’s what beginners do !) . In short , the debug printout lists that the max number of bounding boxes is 0 , and finally it prints that detecnet model fails.

  6. I then discovered that common to detectnet-net console and detectnet-camera is the c++ class detectnet , so I started to insert some debug printf message of my own in order to report where in the init chain the class fails.

Long story short :

7 the allocDetections() functions fails . The reason being that mMaxDetections = 0 . Then when cudaAllocMapped is called it is zero and fails.

8 I then did something naughty and inserted mMaxDetections = 1 , overriding the equations that sets the value. I recompiled with make , ran all four of the detectnet-console examples and all is well . the output jpg images = same as the reference samples on the website.

So I assume that I have done something wrong , but I do hope that the changes that I have temporarily perform on the ‘holy ground’ of the library source could help somebody to understand my problem.

Regards

Anton Reinhardt

Hi Anton, can you provide the text from the terminal log when you run the detectnet-console program on an image? Thanks.

Hi Dusty

Thank for your response . As requested :

[b]listing 1 : (this is the untampered standard version) :

command = ./detectnet-console peds-003.jpg output.jpg >anton.txt

output :[/b]

detectNet – loading detection network model from:
– prototxt networks/ped-100/deploy.prototxt
– model networks/ped-100/snapshot_iter_70800.caffemodel
– input_blob ‘data’
– output_cvg ‘coverage’
– output_bbox ‘bboxes’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels networks/ped-100/class_labels.txt
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT] loading network profile from engine cache… networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, networks/ped-100/snapshot_iter_70800.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding – index 0
– name ‘data’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 512 (SPATIAL)
– dim #2 1024 (SPATIAL)
[TRT] binding – index 1
– name ‘coverage’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (CHANNEL)
– dim #1 32 (SPATIAL)
– dim #2 64 (SPATIAL)
[TRT] binding – index 2
– name ‘bboxes’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 4 (CHANNEL)
– dim #1 32 (SPATIAL)
– dim #2 64 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=512 w=1024) size=6291456
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=1 c=1 h=32 w=64) size=8192
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=1 c=4 h=32 w=64) size=32768
device GPU, networks/ped-100/snapshot_iter_70800.caffemodel initialized.
detectNet – number object classes: 1
detectNet – maximum bounding boxes: 0
detectnet-console: failed to initialize detectNet

[b]listing 2 : (this is the standard version with additional debug messages (ar_dbg) :

command = ./detectnet-console peds-003.jpg output.jpg >anton.txt

output : (only the tail shown) [/b]

[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=512 w=1024) size=6291456
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=1 c=1 h=32 w=64) size=8192
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=1 c=4 h=32 w=64) size=32768
device GPU, networks/ped-100/snapshot_iter_70800.caffemodel initialized.
ar_dbg : entering allocDetections
ar_dbg : model type != MODEL_UFF and model type != MODEL_ONXX
detectNet – number object classes: 1
detectNet – maximum bounding boxes: 0
ar_dbg : cudaAllocMapped function failed . aborted
detectnet-console: failed to initialize detectNet

[b]listing 3 : (this is the 'tampered version where I set maxdetections to 1 :

command = ./detectnet-console peds-003.jpg output.jpg >anton.txt

output : (only the tail shown) [/b]

detectNet – loading detection network model from:
– prototxt networks/ped-100/deploy.prototxt
– model networks/ped-100/snapshot_iter_70800.caffemodel
– input_blob ‘data’
– output_cvg ‘coverage’
– output_bbox ‘bboxes’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels networks/ped-100/class_labels.txt
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT] loading network profile from engine cache… networks/ped-100/snapshot_iter_70800.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, networks/ped-100/snapshot_iter_70800.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding – index 0
– name ‘data’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 512 (SPATIAL)
– dim #2 1024 (SPATIAL)
[TRT] binding – index 1
– name ‘coverage’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1 (CHANNEL)
– dim #1 32 (SPATIAL)
– dim #2 64 (SPATIAL)
[TRT] binding – index 2
– name ‘bboxes’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 4 (CHANNEL)
– dim #1 32 (SPATIAL)
– dim #2 64 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=512 w=1024) size=6291456
[TRT] binding to output 0 coverage binding index: 1
[TRT] binding to output 0 coverage dims (b=1 c=1 h=32 w=64) size=8192
[TRT] binding to output 1 bboxes binding index: 2
[TRT] binding to output 1 bboxes dims (b=1 c=4 h=32 w=64) size=32768
device GPU, networks/ped-100/snapshot_iter_70800.caffemodel initialized.
ar_dbg : entering allocDetections
ar_dbg : model type != MODEL_UFF and model type != MODEL_ONXX
detectNet – number object classes: 1
detectNet – maximum bounding boxes: 1
ar_dbg : leaving allocDetections . return true
detectNet – loaded 1 class info entries
detectNet – number of object classes: 1
[image] loaded ‘peds-003.jpg’ (1024 x 611, 3 channels)
5 objects detected
detected obj 0 class #0 (person) confidence=0.872070
bounding box 0 (692.062500, 43.632202) (841.000000, 459.890869) w=148.937500 h=416.258667
detected obj 1 class #0 (person) confidence=0.899902
bounding box 1 (851.187500, 59.966309) (1014.125000, 490.470703) w=162.937500 h=430.504395
detected obj 2 class #0 (person) confidence=1.076172
bounding box 2 (16.687500, 13.723633) (227.250000, 558.939697) w=210.562500 h=545.216064
detected obj 3 class #0 (person) confidence=0.681152
bounding box 3 (374.250000, 34.756592) (619.109375, 598.320557) w=244.859375 h=563.563965
detected obj 4 class #0 (person) confidence=0.959961
bounding box 4 (549.156250, 130.001587) (617.781250, 319.223633) w=68.625000 h=189.222046

[TRT] ----------------------------------------------
[TRT] Timing Report networks/ped-100/snapshot_iter_70800.caffemodel
[TRT] ----------------------------------------------
[TRT] Pre-Process CPU 0.08740ms CUDA 8.18458ms
[TRT] Network CPU 243.50406ms CUDA 234.84735ms
[TRT] Post-Process CPU 2.01807ms CUDA 1.92625ms
[TRT] Visualize CPU 0.26687ms CUDA 61.72536ms
[TRT] Total CPU 245.87640ms CUDA 306.68356ms
[TRT] ----------------------------------------------

[TRT] note – when processing a single image, run ‘sudo jetson_clocks’ before
to disable DVFS for more accurate profiling/timing measurements

detectnet-console: writing 1024x611 image to ‘output.jpg’
detectnet-console: successfully wrote 1024x611 image to ‘output.jpg’
detectnet-console: shutting down…
detectnet-console: shutdown complete

below is the code that i have changed

// allocDetections
bool detectNet::allocDetections()
{
printf(“ar_dbg : entering allocDetections\n”);
// determine max detections
if( IsModelType(MODEL_UFF) ) // TODO: fixme
{
printf(“ar_dbg : model type = MODEL_UFF\n”);
printf(“W = %u H = %u C = %u\n”, DIMS_W(mOutputs[OUTPUT_UFF].dims), DIMS_H(mOutputs[OUTPUT_UFF].dims), DIMS_C(mOutputs[OUTPUT_UFF].dims));
mMaxDetections = DIMS_H(mOutputs[OUTPUT_UFF].dims) * DIMS_C(mOutputs[OUTPUT_UFF].dims);
}
else if( IsModelType(MODEL_ONNX) )
{
printf(“ar_dbg : model type = MODEL_ONNX\n”);
mMaxDetections = 1;
mNumClasses = 1;
printf(“detectNet – using ONNX model\n”);
}
else
{
printf(“ar_dbg : model type != MODEL_UFF and model type != MODEL_ONXX\n”);
mMaxDetections = DIMS_W(mOutputs[OUTPUT_CVG].dims) * DIMS_H(mOutputs[OUTPUT_CVG].dims) /** DIMS_C(mOutputs[OUTPUT_CVG].dims)*/ * mNumClasses;
mNumClasses = DIMS_C(mOutputs[OUTPUT_CVG].dims);
printf(“detectNet – number object classes: %u\n”, mNumClasses);
}

//----------------------
// ar forced values
//---------------------
mMaxDetections = 1;

printf("detectNet -- maximum bounding boxes:  %u\n", mMaxDetections);

// allocate array to store detection results
const size_t det_size = sizeof(Detection) * mNumDetectionSets * mMaxDetections;

if( !cudaAllocMapped((void**)&mDetectionSets[0], (void**)&mDetectionSets[1], det_size) )
{
	printf("ar_dbg : cudaAllocMapped function failed . aborted\n");
	return false;
}
memset(mDetectionSets[0], 0, det_size);

printf("ar_dbg : leaving allocDetections . return true \n");
return true;

}

Thank you in advance.
Best Regards

Anton

Sorry Anton, looks like I broke that code on Friday - thanks for letting me know. Just checked in the fix on GitHub with commit f98f6b.

If you re-clone the repo, it should be working again without further modification.

Hi Dusty

Thank you for your response and solution . I am new here , but what I have taken from this so far is that you are serving our community in a super awesome manner , so no need to be sorry . I am super excited about this nvidia jetson platform , thank you for bringing it to our doorsteps . I am here to stay.

Best Regards

Anton