[TensorRT] Failure on loading VGG-16 caffe model

yasunori.endo · February 15, 2018, 6:55am

Hi.

I’m going to inference VGG-16 caffemodel from following model zoo.

gist.github.com

https://gist.github.com/ksimonyan/211839e770f7b538e2d8

VGG_ILSVRC_16_layers_deploy.prototxt

name: "VGG_ILSVRC_16_layers"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"

This file has been truncated. show original

readme.md

##Information

name: 16-layer model from the arXiv paper: "Very Deep Convolutional Networks for Large-Scale Image Recognition"

caffemodel: VGG_ILSVRC_16_layers

caffemodel_url: http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel

license: see http://www.robots.ox.ac.uk/~vgg/research/very_deep/

This file has been truncated. show original

TensorRT2 fails to load VGG-16 model above.

On the other hand, TensorRT succeeds to load following GoogLeNet model.

The VGG-16 model could be loaded and inferenced with NVCaffe.
So I guess there’s something wrong with TensorRT caffe loader.

Can you give me any advise?

I’m using following inference code.
My code should be correct because GoogLeNet succeeds to inference.

// initialize TensorRT network optimizer
    IBuilder* builder = createInferBuilder(logger_);
    CHECK(builder);

    // parser caffe model
    INetworkDefinition* network = builder->createNetwork();
    ICaffeParser* parser = createCaffeParser();
    CHECK(network);
    CHECK(parser);

    const IBlobNameToTensor* blobNameToTensor = parser->parse(model_file.c_str(),
                                                              trained_file.c_str(),
                                                              *network,
                                                              DataType::kFLOAT);
    CHECK(blobNameToTensor);

    // mark output of network (caffe model doesn't have output info)
    for (auto& s : outputs) {
        std::cout << "[INFO] marking blob " << s << " as output." << std::endl;
        CHECK(blobNameToTensor->find(s.c_str()));
        network->markOutput(*blobNameToTensor->find(s.c_str()));
    }

    // build TensorRT engine
    builder->setMaxBatchSize(1);
    builder->setMaxWorkspaceSize(16 << 20);

    ICudaEngine* engine = builder->buildCudaEngine(*network);
    CHECK(engine);

    // destroy used objects
    network->destroy();
    parser->destroy();

    // serialize the engine and close TensorRT optimizer
    IHostMemory* modelStream = engine->serialize();
    engine->destroy();
    builder->destroy();
    shutdownProtobufLibrary();

I got NULL pointer for blobNameToTensor->find(s.c_str()) on line 20 with VGG-16 model.
Also I got segmentation fault on builder->buildCudaEngine(*network); on line 28.

I also tried giexec.
giexec with VGG failed to find output blob.

nvidia@tegra-ubuntu:~/oss/tensorrt_samples/bin$ ./giexec --deploy=$HOME/VGG_ILSVRC_16_layers_deploy.prototxt.txt --output=prob
deploy: /home/nvidia/VGG_ILSVRC_16_layers_deploy.prototxt.txt
output: prob
Input "data": 3x224x224
could not find output blob prob
Engine could not be created
Engine could not be created

giexec with GoogLeNet succeeds.

nvidia@tegra-ubuntu:~/oss/tensorrt_samples/bin$ ./giexec --deploy=/home/nvidia/oss/tensorrt_samples/bin/googlenet_org/googlenet.prototxt --output=prob
deploy: /home/nvidia/oss/tensorrt_samples/bin/googlenet_org/googlenet.prototxt
output: prob
Input "data": 3x224x224
Output "prob": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 32.6127 ms.
Average over 10 runs is 16.0368 ms.
Average over 10 runs is 16.0486 ms.
Average over 10 runs is 16.0474 ms.
Average over 10 runs is 16.1648 ms.
Average over 10 runs is 16.1458 ms.
Average over 10 runs is 16.0479 ms.
Average over 10 runs is 16.1553 ms.
Average over 10 runs is 16.0746 ms.
Average over 10 runs is 16.0705 ms.

I examined prototxt of VGG-16 but I couldn’t find what’s wrong with it.

Can you give me any advise?

AastaLLL · February 26, 2018, 5:42am

Hi,

Could you share some environment information with us?
Do you use TensorRT on an x86 Linux machine or Jetson?

Thanks.

yasunori.endo · February 26, 2018, 2:40pm

Hi,

Thank you for your comment.
I used Jetson TX2 installed with both JetPack 3.1 and 3.2 DP.

Thanks.

AastaLLL · February 27, 2018, 3:32am

Move this topic to TX2 board.

AastaLLL · March 1, 2018, 2:28am

Hi,
Our Caffe parser doesn’t support the legacy format. Please refine your prototxt file into standard Caffe format.

Ex.
Modify this:
layers {
bottom: “data”
top: “conv1_1”
name: “conv1_1”
type: CONVOLUTION
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}

To this:
layer {
bottom: “data”
top: “conv1_1”
name: “conv1_1”
type: “Convolution”
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}

You can also find this information in our document:
http://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#nvcaffeparser
------
Note: NvCaffeParser does not support legacy formats in NVCaffe prototxt; in particular, layer types are expected to be expressed in the prototxt as strings delimited by double quotes.

Thanks.

yasunori.endo · March 1, 2018, 2:39am

Hi,

Thank you for your suggestion.

I got following error when I tried as your suggested.
Do you know why this error occurred?

[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format ditcaffe.NetParameter: 11:9: Expected integer or identifier, got: "Convolution"

Here’s head of my deploy.prototxt

name: "VGG_ILSVRC_16_layers"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: "Convolution"
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: "ReLU"
}

yasunori.endo · March 1, 2018, 3:17am

Hi,

I missed layers → layer.
Following prototxt succeeds for parsing with TensorRT
https://gist.github.com/jo7ueb/c07a629558dcc47fc2ba1bc01345c88e

Thank you for your advice, my problem is cleared!!

AastaLLL · March 2, 2018, 1:49am

Thanks for updating your progress with us : )

Andrey1984 · October 12, 2018, 9:36pm

if I am upgrading the file - shall I update values like RELU to “Relu” ?
proto.txt (4.64 KB)

Andrey1984 · October 12, 2018, 10:10pm

approaching setting up the environment like mentioned in the script.
Starting up NGC Caffe [v2? 1?] docker container with a purpose to approach the script execution, as when I execute it under NGC DIGITS 18.09 it will complain in the following manner:

oot@e121f4ebc4fe:/workspace# mv upg.rade upg.cpp
root@e121f4ebc4fe:/workspace# gcc -o u upg.cpp 
In file included from /usr/include/c++/5/atomic:38:0,
                 from /usr/local/include/caffe/common.hpp:30,
                 from /usr/local/include/caffe/blob.hpp:11,
                 from /usr/local/include/caffe/caffe.hpp:7,
                 from upg.cpp:10:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
In file included from /usr/local/include/caffe/common.hpp:48:0,
                 from /usr/local/include/caffe/blob.hpp:11,
                 from /usr/local/include/caffe/caffe.hpp:7,
                 from upg.cpp:10:
/usr/local/include/caffe/util/device_alternate.hpp:4:23: fatal error: cublas_v2.h: No such file or directory
compilation terminated.

cublas seems missing

Andrey1984 · October 12, 2018, 10:28pm

new caffe container started:

sudo nvidia-docker run -it --rm -v /home/nvidia/data/mnist:/data/mnist nvcr.io/nvi
dia/caffe:18.09-py2
==================
== NVIDIA Caffe ==
==================
NVIDIA Release 18.09 (build 687535)
Container image Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

approaching the upgrade script compilation

Unfortunately, the same error:

usr/local/include/caffe/util/device_alternate.hpp:4:23: fatal error: cublas_v2.h: No such file or directory

But on the other hand:

find / -name cublas_v2.h
/usr/local/cuda-10.0/targets/x86_64-linux/include/cublas_v2.h

However, the issue got resolved the other way, via manual update of style in prototxt file.
Thanks.

drew.scott65 · March 12, 2019, 10:32am

Thanks you solved my problem too

Topic		Replies	Views
Inference VGG16 caffe model can not find the output Jetson Xavier NX jetson-inference	3	863	July 29, 2021
about deploying the caffe model inside Jeston TX2 using TensorRT TensorRT	6	1201	November 19, 2019
VGG_VOC0712_SSD_300x300_iter_120000.caffemodel cannot be found for download Jetson TX2 jetson-inference	4	857	February 6, 2023
why received signal SIGSEGV when import deploy.prototxt with Tensor RT 3.0 DeepStream SDK	10	1889	December 18, 2017
CaffeParser: Could not parse deploy file Segmentation fault (core dumped) TensorRT	6	1340	December 4, 2019
Perfectly fine .caffemodel file not being parsed: assert(blob_name_to_tensor) error TensorRT	5	1064	September 22, 2018
Xavier NX Jetpack 4.4 GA TRT gives wrong results for a specific caffe model Jetson Xavier NX tensorrt	4	770	July 17, 2020
[TRT] jetson agx orion error - CaffeParser: Could not open file device GPU, failed to load networks/Googlenet/bvlc_googlenet.caffemodel Jetson AGX Xavier jetson-inference , tf-trt , tensorrt-model-optimizer	3	147	October 18, 2024
caffeToGIEModel() segmentation fault Jetson TX1	8	1689	December 20, 2017
Inference Yolov3 Caffe Model TensorRT	0	847	January 8, 2020

[TensorRT] Failure on loading VGG-16 caffe model

Related topics