Caffe failed with py-faster-rcnn demo.py on TX1

TimCook · October 30, 2016, 4:13pm

I want to demo the faster rcnn on Tx1.

https://github.com/rbgirshick/py-faster-rcnn

After solving many problems, there is still an error with gpu_nms.
Does someone also want to try this caffe neuron network on Tx1?
Please give me a hand, if you have solved this problem.
Many thanks.

I ran with python 3.5.1.
Caffe was perfectly installed with cuda8.0 and opencv 3.1.0.

Traceback (most recent call last):
  File "tools/demo.py", line 18, in <module>
    from fast_rcnn.test import im_detect
  File "/home/ubuntu/Desktop/py-faster-rcnn/tools/../lib/fast_rcnn/test.py", line 17, in <module>
    from fast_rcnn.nms_wrapper import nms
  File "/home/ubuntu/Desktop/py-faster-rcnn/tools/../lib/fast_rcnn/nms_wrapper.py", line 9, in <module>
    from nms.gpu_nms import gpu_nms
ImportError: /home/ubuntu/Desktop/py-faster-rcnn/tools/../lib/nms/gpu_nms.so: undefined symbol: _Py_ZeroStruct

TimCook · October 30, 2016, 4:14pm

How to solve this _Py_ZeroStruct error?

AastaLLL · November 2, 2016, 8:20am

Hi,

Faster-rcnn has their own caffe repo (contains some self-implemented layers) and it is required to compile caffe nested in py-faster-rcnn rather than BVLC caffe.

But faster-rcnn can work WELL on jetson tx1 with 24.2. You can follow this:

Use Jetpack to install CUDA-8.0, cuDNN v5.1, opencv4Tegra 2.4.13
Clone code

git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git

caffe and pycaffe

Sync following file to BVLC caffe in order to support cuDNNv5
→ You can just git clone BVLC caffe and replace folloing file directly

include/caffe/util/:
cudnn.hpp

src/caffe/layers/:
cudnn_conv_layer.cu
cudnn_relu_layer.cpp
cudnn_relu_layer.cu
cudnn_sigmoid_layer.cpp
cudnn_sigmoid_layer.cu
cudnn_tanh_layer.cpp
cudnn_tanh_layer.cu

include/caffe/layers/:
cudnn_relu_layer.hpp
cudnn_sigmoid_layer.hpp
cudnn_tanh_layer.hpp

Modify configure

cp Makefile.config.example Makefile.config

edit Makefile.config

+++ USE_CUDNN := 1
+++ WITH_PYTHON_LAYER := 1
--- INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
+++ INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

edit Makefile

--- LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
+++ LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

Build

make
make pycaffe

faster RCNN

cd $FRCN_ROOT/lib
make

cd $FRCN_ROOT
./data/scripts/fetch_faster_rcnn_models.sh

cd $FRCN_ROOT
./tools/demo.py

Possible error
numpy error:

sudo apt-get install python-numpy

cython error:

sudo apt-get install cython

checksum incorrect error when run data/scripts/fetch_faster_rcnn_models.sh
edit data/scripts/fetch_faster_rcnn_models.sh

--- wget $URL -O $FILE
+++ wget --no-check-certificate $URL -O $FILE

ImportError: No module named easydict:

sudo apt-get install python-pip
sudo pip install easydict

locale.Error: unsupported locale setting:

sudo apt-get install language-pack-id
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
sudo dpkg-reconfigure locales

ImportError: No module named cv2:

sudo apt-get install python-opencv

ImportError: No module named skimage.io:

sudo apt-get install libfreetype6-dev
sudo pip install scikit-image

‘GDK_IS_DISPLAY (display)’ failed:

sudo apt-get install libv4l-dev
sudo apt-get install xorg
export DISPLAY=:1
startx &

ImportError: No module named google.protobuf.internal:

sudo apt-get install python-protobuf

ImportError: Cairo backend requires that cairocffi or pycairo is installed:

sudo apt-get install python-dev
sudo apt-get install libffi-dev
sudo pip install cffi
sudo pip install cairocffi

ImportError: No module named yaml:

sudo pip install pyyaml

TimCook · November 6, 2016, 11:58am

Thanks for your help, AastaLLL!!

Something in the $FRCN_ROOT/lib still cause the error.

I am now using python 3.5.1. I have gone through all the steps above.
But the _Py_ZeroStruct error still exists.

TimCook · November 6, 2016, 1:36pm

in $FRCN_ROOT/lib/setup.py

replace line 123, 140with
+++include_dirs = [‘/usr/lib/python3/dist-packages/numpy/core/include’,‘/usr/include/python3.5m’]

can solve this _Py_ZeroStruct problem

TimCook · November 6, 2016, 2:22pm

After running, some errors occur

Loaded network /home/ubuntu/Desktop/py-faster-rcnn/data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel
Killed

How do I solve this?

TimCook · November 7, 2016, 4:00am

Switch the model to zf models can skip this error.

python3 tools/demo.py --net zf

But I still want to know why VGG16 model can’t work on Tx1.

TimCook · November 7, 2016, 4:05am

I want to ask NVIDIA engineer a question.
Is it possible for me to get the tensorRT to train RCNN net?
I have learned that tensorRT can run faster than caffe on TX1.
Could you help me pass the verification?

AastaLLL · November 7, 2016, 5:19am

Hi,

TensorRT targets for accelerating inference time, so it doesn’t support back-propagation and can’t use for training.

If you want to train networks on desktop, it’s good choice to try DIGITs.

AastaLLL · November 7, 2016, 6:48am

Hi,

I think VGG16’s problem is caused by out of gpu memory since it takes about 2360Mb gpu memory for executing.
Maybe you need to close all unnecessary program to free enough gpu memory for this model.
It can be loaded successfully in my environment and I only run demo.py.

You can use this program to query gpu memory usage information

#include <iostream>
#include <unistd.h>
#include "cuda.h"

int main()
{
    // show memory usage of GPU
    size_t free_byte ;
    size_t total_byte ;

    while (true )
    {
        cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;

        if ( cudaSuccess != cuda_status ){
            std::cout << "Error: cudaMemGetInfo fails, " << cudaGetErrorString(cuda_status) << std::endl;
            exit(1);
        }

        double free_db = (double)free_byte ;
        double total_db = (double)total_byte ;
        double used_db = total_db - free_db ;

        std::cout << "GPU memory usage: used = " << used_db/1024.0/1024.0 << ", free = "
                  << free_db/1024.0/1024.0 << " MB, total = " << total_db/1024.0/1024.0 << " MB" << std::endl;
        sleep(1);
    }

    return 0;
}

compile with

nvcc test.cu -o test

run as

./test

TimCook · November 8, 2016, 3:03am

Thanks for your help. It’s very useful. Really appreciate!!

DIGITS really draws my attention. I think it a highly intuitive model for training.
Here are some basic questions I really want to know.

Can I use DIGITS to train complex caffe model, like RCNN?
Can its result be integrated into many brand of training network, like theano, tensorflow and so on?

Sorry i am new, but I really want to get the know-how. Maybe, you can get me through.
Many thanks.

AastaLLL · November 10, 2016, 2:09am

Hi,

DIGITs not support r-cnn currently but it includes DetectNet, a network can predict object bounding boxes directly.
If your use-case is object localization or object detection, it’s a good choice to use DetectNet.

Train it on DIGITs:

Inference with this sample code which use tensorRT to speed-up inference time. ( it takes caffemodel as input )

DIGITs currently can’t output other framework’s model. But you can use third-party’s program to convert the caffemodel to your target network. For example, search ‘caffe model to tensorflow’, there are several program can help you achieve this.

Feel free to ask any question here : )

Obaid · April 17, 2017, 1:45am

Hi,
I did the step until make in *Build step and I got the following:

PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/syncedmem.cpp
In file included from src/caffe/syncedmem.cpp:1:0:
./include/caffe/common.hpp:4:32: fatal error: boost/shared_ptr.hpp: No such file or directory
compilation terminated.
Makefile:564: recipe for target ‘.build_release/src/caffe/syncedmem.o’ failed
make: *** [.build_release/src/caffe/syncedmem.o] Error 1

AastaLLL · April 18, 2017, 4:44am

Hi,

Thanks for your feedback.

This is related to the boost library. Please try this command to check if it is okay.

sudo apt-get install --no-install-recommends libboost-all-dev

Thanks and please let us know the result.

Obaid · April 20, 2017, 5:25pm

Thanks @AastaLLL
It is still the same Error may that because apt-get update in JetPack 2.3 gave me the Following:

W: file:///var/cuda-repo-8-0-local/Release.gpg: Signature by key 889BEE522DA690103C4B085ED88C3D385C37D3BE uses weak digest algorithm (SHA1)
W: Invalid ‘Date’ entry in Release file /var/lib/apt/lists/_var_cuda-repo-8-0-local_Release
W: Invalid ‘Date’ entry in Release file /var/lib/apt/lists/_var_libopencv4tegra-repo_Release
W: Invalid ‘Date’ entry in Release file /var/lib/apt/lists/_var_nv-gie-repo-6-rc-cuda8.0_Release
W: Invalid ‘Date’ entry in Release file /var/lib/apt/lists/_var_visionworks-repo_Release
W: Invalid ‘Date’ entry in Release file /var/lib/apt/lists/_var_visionworks-sfm-repo_Release
W: Invalid ‘Date’ entry in Release file /var/lib/apt/lists/_var_visionworks-tracking-repo_Release
E: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/xenial/main/binary-arm64/Packages 404 Not Found [IP: 91.189.91.23 80]
E: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/xenial-updates/main/binary-arm64/Packages 404 Not Found [IP: 91.189.91.23 80]
E: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/xenial-backports/main/binary-arm64/Packages 404 Not Found [IP: 91.189.91.23 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.

AastaLLL · April 24, 2017, 7:01am

Hi,

Thanks for your feedback.
This invalid Date warning can be safely skipped.

Let’s track further installation on topic 1004976:
https://devtalk.nvidia.com/default/topic/1004976/jetson-tx1/faster-r-cnn-on-jetson-tx1/

Thanks.

h.ito · January 29, 2018, 12:58pm

Though this is on the TX2 flashed with JetPack3.1, let me ask your help.

After installing faster-RCNN, I could run demo.py with the test images. However it does not show the images. The error message is: “Couldn’t find foreign struct converter for ‘cairo.Context’”
With the same setting, I also tried to run a video application and failed with the error message
“Gtk-ERROR **: GTK+ 2.x symbols detected. Using GTK+ 2.x and GTK+ 3 in the same process is not supported”

kayccc · February 1, 2018, 2:59am

Hi h.ito,

Please file your issue into TX2 borad: [url]https://devtalk.nvidia.com/default/board/188/jetson-tx2/[/url]

Thanks

Topic		Replies	Views
Create Object Detection Model without DIGITS? Jetson TX2	25	3281	October 18, 2021
Caffe SSD on TX2 - CUDNN_STATUS_INTERNAL_ERROR Jetson TX2	20	7878	October 18, 2021
Deep Learning Inference: Performance validation on TX1 Jetson TX1	16	15032	November 2, 2021
caffe make runtest failed Jetson TX1	6	2865	October 18, 2021
Caffe tests get stuck Jetson TX1	6	1449	October 18, 2021
Installing caffe on jetson xavier NX Jetson Xavier NX neural-network-framework	6	1976	November 17, 2021
NVCaffe support on TX2 Jetson TX2	24	7240	October 18, 2021
Caffe make faild Jetson Nano cuda , caffe	11	2034	December 22, 2021
Jetson Inference DetectNet Problems Jetson Nano tensorrt , jetson-inference , nvbugs	17	2667	October 15, 2021
Converting Caffe model to TensorRT Jetson TX2	33	11484	October 18, 2021

Caffe failed with py-faster-rcnn demo.py on TX1

Related topics