Jetson Inference DetectNet Problems

jetsonnvidia · June 9, 2020, 9:29pm

I am following this section of the jetson-inference Github (I already raised an issue there but got no reply):

dusty-nv/jetson-inference/blob/master/docs/detectnet-training.md#creating-detectnet-model-with-digits

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="imagenet-custom.md">Back</a> | <a href="detectnet-snapshot.md">Next</a> | </sup><a href="../README.md#two-days-to-a-demo-digits"><sup>Contents</sup></a>
<br/>
<sup>Object Detection</sup></p> 

# Locating Objects with DetectNet
The previous image recognition examples output class probabilities representing the entire input image.   The second deep learning capability we're highlighting in this tutorial is detecting objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes).  This is performed using a 'detectNet' - or object detection / localization network.

The [`detectNet`](../c/detectNet.h) object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes.  To train the object detection model, first a pretrained ImageNet recognition model (like Googlenet) is used with bounding coordinate labels included in the training dataset in addition to the source imagery.

The following pretrained DetectNet models are included with the tutorial:

1. **ped-100**  (single-class pedestrian detector)
2. **multiped-500**   (multi-class pedestrian + baggage detector)
3. **facenet-120**  (single-class facial recognition detector)
4. **coco-airplane**  (MS COCO airplane class)
5. **coco-bottle**    (MS COCO bottle class)
6. **coco-chair**     (MS COCO chair class)
7. **coco-dog**       (MS COCO dog class)

This file has been truncated. show original

Sadly I am unable to create the new DetectNet model with DIGITS because soon after I click on “Create”, I get an error:

ERROR: error code -11

Please note I am using Caffe 0.16 as 0.15 would not build. Here is my caffe_output.log:

caffe_output.log (328.9 KB)

Please can someone help me get this working? All the previous sections including classification with DIGITS worked fine. :-(

Here is a screenshot with more information on my setup:

AastaLLL · June 10, 2020, 2:24am

Hi,

It looks like you are meeting the similar error as this issue:

github.com/NVIDIA/DIGITS

ERROR: error code -11 in detectnet training

opened 10:44PM - 02 Nov 16 UTC

closed 09:04AM - 06 Nov 16 UTC

amlarraz

caffe object-detection

Hi! I trying to train Detecnet with my own data, i can build the dataset perfect…ly and following [this](https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection) tutorial to train detectnet, but, in the begining of the train, give me the following error: ERROR: error code -11 My configuration is the next: Ubuntu 16.04 Geforce GTX 1080 cuda 8.0 caffe 0.15.14 Driver nvidia: 367.57 I installed Caffe from https://github.com/BVLC/caffe and also try to install nv-caffe from https://github.com/NVIDIA/caffe.git and have both installed (nv-caffe give me an error during make runtest about 3 layers, but i think are data augmentation layers like detectnet.resize or something like this) Any idea? thanks in advance!

Would you mind to give the comment a try first?
Remake the Caffe with this configure:

WITH_PYTHON_LAYER := 1

Thanks.

jetsonnvidia · June 10, 2020, 11:24am

Hi, yes I already have that in my Makefile.configure.

Just to be sure, did a “make clean” and rebuilt Caffe but the same problem persists. I am using Caffe 0.16 by the way, (0.15 did not build correctly) so I hope this is not an issue. As I say, I have had no problem doing classification in the previous jetson-inference examples.

When I start DIGITS server I get this message so I hope this is not a cause of the problem:
“Couldn’t import dot_parser, loading of dot files will not be possible.”

AastaLLL · June 24, 2020, 6:08am

Hi,

Sorry for the late update.
Instead running “make” comment, would you mind to reset the python binding and try it again.

Here is a similar issue for your reference:

github.com/NVIDIA/DIGITS

Task failed with error code -11

opened 08:20PM - 09 Nov 17 UTC

MonkeyWithAComputer

Using Latest Versions(as of 11-9-17) of Digits and NVCaffe and Opencv 3.2.1. I a…m following the object detection guide here: https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection When I create the model I get error code -11 after 2 seconds of running. Ive tried reinstalling different versions of OpenCV, NVCaffe, and DIGITS. This is the DIGITS GUI output: Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT Created Layer bbox_loss (172) bbox_loss <- bboxes-obj-masked-norm bbox_loss <- bbox-obj-label-norm bbox_loss -> loss_bbox Setting up bbox_loss TEST Top shape for layer 172 'bbox_loss' (1) with loss weight 2 Creating layer 'coverage_loss' of type 'EuclideanLoss' Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT Created Layer coverage_loss (173) coverage_loss <- coverage_coverage/sig_0_split_0 coverage_loss <- coverage-label_slice-label_4_split_0 coverage_loss -> loss_coverage Setting up coverage_loss TEST Top shape for layer 173 'coverage_loss' (1) with loss weight 1 Creating layer 'cluster' of type 'Python' Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT Importing Python module 'caffe.layers.detectnet.clustering' The following error is printed in DIGITS console: 2017-11-09 11:58:06 [20171109-115804-1668] [INFO ] Task subprocess args: "/home/dev/caffe/build/tools/caffe train --solver=/home/dev/DIGITS/digits/jobs/20171109-115804-1668/solver.prototxt --gpu=0,1 --weights=/home/dev/bvlc_googlenet.caffemodel" 2017-11-09 11:58:08 [20171109-115804-1668] [ERROR] Train Caffe Model task failed with error code -11 The Following are the last couple lines of the Caffe log: I1109 11:36:37.703176 30569 net.cpp:182] Created Layer bbox-obj-norm (171) I1109 11:36:37.703177 30569 net.cpp:559] bbox-obj-norm <- bboxes-masked-norm I1109 11:36:37.703179 30569 net.cpp:559] bbox-obj-norm <- obj-block_obj-block_0_split_1 I1109 11:36:37.703182 30569 net.cpp:528] bbox-obj-norm -> bboxes-obj-masked-norm I1109 11:36:37.703200 30569 net.cpp:243] Setting up bbox-obj-norm I1109 11:36:37.703203 30569 net.cpp:250] TEST Top shape for layer 171 'bbox-obj-norm' 2 4 24 78 (14976) I1109 11:36:37.703204 30569 layer_factory.hpp:136] Creating layer 'bbox_loss' of type 'L1Loss' I1109 11:36:37.703213 30569 layer_factory.hpp:148] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT I1109 11:36:37.703222 30569 net.cpp:182] Created Layer bbox_loss (172) I1109 11:36:37.703223 30569 net.cpp:559] bbox_loss <- bboxes-obj-masked-norm I1109 11:36:37.703225 30569 net.cpp:559] bbox_loss <- bbox-obj-label-norm I1109 11:36:37.703228 30569 net.cpp:528] bbox_loss -> loss_bbox I1109 11:36:37.705127 30569 net.cpp:243] Setting up bbox_loss I1109 11:36:37.705132 30569 net.cpp:250] TEST Top shape for layer 172 'bbox_loss' (1) I1109 11:36:37.705133 30569 net.cpp:254] with loss weight 2 I1109 11:36:37.705142 30569 layer_factory.hpp:136] Creating layer 'coverage_loss' of type 'EuclideanLoss' I1109 11:36:37.705145 30569 layer_factory.hpp:148] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT I1109 11:36:37.705149 30569 net.cpp:182] Created Layer coverage_loss (173) I1109 11:36:37.705152 30569 net.cpp:559] coverage_loss <- coverage_coverage/sig_0_split_0 I1109 11:36:37.705154 30569 net.cpp:559] coverage_loss <- coverage-label_slice-label_4_split_0 I1109 11:36:37.705157 30569 net.cpp:528] coverage_loss -> loss_coverage I1109 11:36:37.707231 30569 net.cpp:243] Setting up coverage_loss I1109 11:36:37.707234 30569 net.cpp:250] TEST Top shape for layer 173 'coverage_loss' (1) I1109 11:36:37.707237 30569 net.cpp:254] with loss weight 1 I1109 11:36:37.707240 30569 layer_factory.hpp:136] Creating layer 'cluster' of type 'Python' I1109 11:36:37.707242 30569 layer_factory.hpp:148] Layer's types are Ftype:FLOAT Btype:FLOAT Fmath:FLOAT Bmath:FLOAT I1109 11:36:37.707250 30569 layer_factory.cpp:325] Importing Python module 'caffe.layers.detectnet.clustering' *** Aborted at 1510256198 (unix time) try "date -d @1510256198" if you are using GNU date *** PC: @ 0x0 (unknown) *** SIGSEGV (@0x0) received by PID 30569 (TID 0x7fa6a66ce8c0) from PID 0; stack trace: *** @ 0x7fa6a38254b0 (unknown) @ 0x0 (unknown) ![screenshot from 2017-11-09 12-11-28](https://user-images.githubusercontent.com/31083352/32627453-52de2018-c547-11e7-99c4-dca01ea315ea.png)

Thanks.

jetsonnvidia · July 6, 2020, 2:22pm

I read your link and reinstalled the protobuf package using pip. I then tried to recreate the detectnet but this failed in the same way as before.

My protobuf is version 3.12.2 so it is greater than the 3.5 that other people say has problems.

What exactly do you mean by “reset the python binding”?

Is the reinstall of protobuf what you mean?

I think you have prematurely marked this as solved. It is definitely not solved from my perspective.

AastaLLL · July 22, 2020, 6:05am

Hi,

Sorry for the late update.

After reinstalling the protobuf, please also recompile the Caffe python library for updating.
Could you give it a try and let us know the following?

Thanks.

jetsonnvidia · July 23, 2020, 6:23pm

I just did a make clean on caffe and rebuilt pycafe etc.

Still get the same error.

The only reason I can see for this error is because I am using Caffe 0.16 instead of the recommended 0.15 however 0.15 won’t even build so I have no choice.

At the bottom of this message is my Caffe makefile.config in the hope it is of some help.

Ultimately, detecting where objects are in an image is not as important as classifying them in my scenario so not the end of the world if we can’t get this working.

## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
# cuDNN version 6 or higher is required.
USE_CUDNN := 1

# NCCL acceleration switch (uncomment to build with NCCL)
# See https://github.com/NVIDIA/nccl
# USE_NCCL := 1

# Builds tests with 16 bit float support in addition to 32 and 64 bit.
# TEST_FP16 := 1

# uncomment to disable IO dependencies and corresponding data layers
# USE_OPENCV := 0
# USE_LEVELDB := 0
# USE_LMDB := 0

# Uncomment if you're using OpenCV 3
OPENCV_VERSION := 3

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
CUDA_ARCH := 	-gencode arch=compute_50,code=sm_50 \
		-gencode arch=compute_52,code=sm_52 \
		-gencode arch=compute_60,code=sm_60 \
		-gencode arch=compute_61,code=sm_61 \
		-gencode arch=compute_61,code=compute_61

# BLAS choice:
# atlas for ATLAS
# mkl for MKL
# open for OpenBlas - default, see https://github.com/xianyi/OpenBLAS
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
BLAS_INCLUDE := /opt/OpenBLAS/include/
BLAS_LIB := /opt/OpenBLAS/lib/

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
		/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
		# $(ANACONDA_HOME)/include/python2.7 \
		# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

# Uncomment to use Python 3 (default is Python 2)
# PYTHON_LIBRARIES := boost_python3 python3.5m
# PYTHON_INCLUDE := /usr/include/python3.5m \
#                 /usr/lib/python3.5/dist-packages/numpy/core/include

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

# shared object suffix name to differentiate branches
LIBRARY_NAME_SUFFIX := -nv

AastaLLL · July 24, 2020, 6:25am

Hi,

We are going to reproduce this issue on our environment.
Would you mind to share your host setup with us? Is it Ubuntu18.04?

Thanks.

jetsonnvidia · July 24, 2020, 3:50pm

Thanks for investigating this.

I am using Debian Buster (i.e. stable). Caffe is built from source as mentioned, version 0.16.

I am using Nvidia driver 440.82 with a GTX-1070. Here are some versions of software I am using:

CUDA 10.2
tensorflow-gpu 1.14.0
protobuf 3.12.2
CuDNN packages: libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb

Hopefully that is everything you need to know but ask me if I missed off some information.

AastaLLL · August 5, 2020, 7:20am

Hi,

Thanks for your feedback.
We are still checking this issue. Will update more information with you later.

Thanks.

jetsonnvidia · August 5, 2020, 8:24am

Okay, thanks.

AastaLLL · August 19, 2020, 7:25am

Hi,

Here are some recent status for you.

We can reproduce this issue on a standard Ubuntu-18.04 desktop.
And pass this issue to our internal team for suggestion now.

Thanks.

jetsonnvidia · August 19, 2020, 8:48am

Great to hear that. Thanks for investigating.

AastaLLL · September 4, 2020, 6:50am

Hi,

Sorry to keep you waiting.
This issue comes from Caffe itself rather than DIGITs.
Please upgrade your protobuf library into v3.1.0.
The training job can work correctly after applying this in our environment.

$ sudo -H pip install --upgrade protobuf==3.1.0.post1

Thanks.

jetsonnvidia · September 11, 2020, 6:35pm

Hi, I will be unable to test this suggestion for the next few weeks but I will respond once I am able to. Thanks for looking into this.

jetsonnvidia · December 1, 2020, 11:10pm

Hi, I just wanted to say that I see DIGITS is now deprecated so this thread is no longer relevant. I’ll bear in mind your answer for the future though. Thanks.

AastaLLL · December 10, 2020, 2:55am

Thanks for the feedback : )

Topic		Replies	Views
Create Object Detection Model without DIGITS? Jetson TX2	25	3282	October 18, 2021
DetectNet Tutorial Problems Jetson TX2	12	1046	October 18, 2021
Digits training error Jetson TX1	16	2348	October 18, 2021
Hello AI World - now supports Python and onboard training with PyTorch! Jetson TX2	13	1814	April 30, 2020
JetPack 4.6 Production Release with L4T 32.6.1 Jetson Nano	47	12022	March 10, 2022
Caffe SSD on TX2 - CUDNN_STATUS_INTERNAL_ERROR Jetson TX2	20	7882	October 18, 2021
ONNX model with Jetson-Inference using GPU Jetson Xavier NX tensorrt , jetson-inference , onnx	38	5643	October 18, 2021
OpenCV, CUDA, Python with Jetson Nano Jetson Nano opencv	58	39382	October 14, 2021
How to build the objection detection framework SSD with tensorRT on tx2? Jetson TX2	96	21869	February 21, 2018
Opencv Face Detection Poor Performance with jetson nano Jetson Nano opencv	51	14216	October 14, 2021

Jetson Inference DetectNet Problems

Related topics