[TensorRT] Failure on loading VGG-16 caffe model

Hi.

I’m going to inference VGG-16 caffemodel from following model zoo.
https://gist.github.com/ksimonyan/211839e770f7b538e2d8

TensorRT2 fails to load VGG-16 model above.

On the other hand, TensorRT succeeds to load following GoogLeNet model.
https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet

The VGG-16 model could be loaded and inferenced with NVCaffe.
So I guess there’s something wrong with TensorRT caffe loader.

Can you give me any advise?


I’m using following inference code.
My code should be correct because GoogLeNet succeeds to inference.

// initialize TensorRT network optimizer
    IBuilder* builder = createInferBuilder(logger_);
    CHECK(builder);

    // parser caffe model
    INetworkDefinition* network = builder->createNetwork();
    ICaffeParser* parser = createCaffeParser();
    CHECK(network);
    CHECK(parser);

    const IBlobNameToTensor* blobNameToTensor = parser->parse(model_file.c_str(),
                                                              trained_file.c_str(),
                                                              *network,
                                                              DataType::kFLOAT);
    CHECK(blobNameToTensor);

    // mark output of network (caffe model doesn't have output info)
    for (auto& s : outputs) {
        std::cout << "[INFO] marking blob " << s << " as output." << std::endl;
        CHECK(blobNameToTensor->find(s.c_str()));
        network->markOutput(*blobNameToTensor->find(s.c_str()));
    }

    // build TensorRT engine
    builder->setMaxBatchSize(1);
    builder->setMaxWorkspaceSize(16 << 20);

    ICudaEngine* engine = builder->buildCudaEngine(*network);
    CHECK(engine);

    // destroy used objects
    network->destroy();
    parser->destroy();

    // serialize the engine and close TensorRT optimizer
    IHostMemory* modelStream = engine->serialize();
    engine->destroy();
    builder->destroy();
    shutdownProtobufLibrary();

I got NULL pointer for blobNameToTensor->find(s.c_str()) on line 20 with VGG-16 model.
Also I got segmentation fault on builder->buildCudaEngine(*network); on line 28.

I also tried giexec.
giexec with VGG failed to find output blob.

nvidia@tegra-ubuntu:~/oss/tensorrt_samples/bin$ ./giexec --deploy=$HOME/VGG_ILSVRC_16_layers_deploy.prototxt.txt --output=prob
deploy: /home/nvidia/VGG_ILSVRC_16_layers_deploy.prototxt.txt
output: prob
Input "data": 3x224x224
could not find output blob prob
Engine could not be created
Engine could not be created

giexec with GoogLeNet succeeds.

nvidia@tegra-ubuntu:~/oss/tensorrt_samples/bin$ ./giexec --deploy=/home/nvidia/oss/tensorrt_samples/bin/googlenet_org/googlenet.prototxt --output=prob
deploy: /home/nvidia/oss/tensorrt_samples/bin/googlenet_org/googlenet.prototxt
output: prob
Input "data": 3x224x224
Output "prob": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 32.6127 ms.
Average over 10 runs is 16.0368 ms.
Average over 10 runs is 16.0486 ms.
Average over 10 runs is 16.0474 ms.
Average over 10 runs is 16.1648 ms.
Average over 10 runs is 16.1458 ms.
Average over 10 runs is 16.0479 ms.
Average over 10 runs is 16.1553 ms.
Average over 10 runs is 16.0746 ms.
Average over 10 runs is 16.0705 ms.

I examined prototxt of VGG-16 but I couldn’t find what’s wrong with it.

Can you give me any advise?

Hi,

Could you share some environment information with us?
Do you use TensorRT on an x86 Linux machine or Jetson?

Thanks.

Hi,

Thank you for your comment.
I used Jetson TX2 installed with both JetPack 3.1 and 3.2 DP.

Thanks.

Move this topic to TX2 board.

Hi,
Our Caffe parser doesn’t support the legacy format. Please refine your prototxt file into standard Caffe format.

Ex.
Modify this:
layers {
bottom: “data”
top: “conv1_1”
name: “conv1_1”
type: CONVOLUTION
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}

To this:
layer {
bottom: “data”
top: “conv1_1”
name: “conv1_1”
type: “Convolution”
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
}
}

You can also find this information in our document:
http://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#nvcaffeparser
------
Note: NvCaffeParser does not support legacy formats in NVCaffe prototxt; in particular, layer types are expected to be expressed in the prototxt as strings delimited by double quotes.

Thanks.

Hi,

Thank you for your suggestion.

I got following error when I tried as your suggested.
Do you know why this error occurred?

[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format ditcaffe.NetParameter: 11:9: Expected integer or identifier, got: "Convolution"

Here’s head of my deploy.prototxt

name: "VGG_ILSVRC_16_layers"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: "Convolution"
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: "ReLU"
}

Hi,

I missed layers -> layer.
Following prototxt succeeds for parsing with TensorRT
https://gist.github.com/jo7ueb/c07a629558dcc47fc2ba1bc01345c88e

Thank you for your advice, my problem is cleared!!

Thanks for updating your progress with us : )

if I am upgrading the file - shall I update values like RELU to “Relu” ?
proto.txt (4.64 KB)

approaching setting up the environment like mentioned in the script.
Starting up NGC Caffe [v2? 1?] docker container with a purpose to approach the script execution, as when I execute it under NGC DIGITS 18.09 it will complain in the following manner:

oot@e121f4ebc4fe:/workspace# mv upg.rade upg.cpp
root@e121f4ebc4fe:/workspace# gcc -o u upg.cpp 
In file included from /usr/include/c++/5/atomic:38:0,
                 from /usr/local/include/caffe/common.hpp:30,
                 from /usr/local/include/caffe/blob.hpp:11,
                 from /usr/local/include/caffe/caffe.hpp:7,
                 from upg.cpp:10:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
In file included from /usr/local/include/caffe/common.hpp:48:0,
                 from /usr/local/include/caffe/blob.hpp:11,
                 from /usr/local/include/caffe/caffe.hpp:7,
                 from upg.cpp:10:
/usr/local/include/caffe/util/device_alternate.hpp:4:23: fatal error: cublas_v2.h: No such file or directory
compilation terminated.

cublas seems missing

new caffe container started:

sudo nvidia-docker run -it --rm -v /home/nvidia/data/mnist:/data/mnist nvcr.io/nvi
dia/caffe:18.09-py2
==================
== NVIDIA Caffe ==
==================
NVIDIA Release 18.09 (build 687535)
Container image Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

approaching the upgrade script compilation

Unfortunately, the same error:

usr/local/include/caffe/util/device_alternate.hpp:4:23: fatal error: cublas_v2.h: No such file or directory

But on the other hand:

find / -name cublas_v2.h
/usr/local/cuda-10.0/targets/x86_64-linux/include/cublas_v2.h

However, the issue got resolved the other way, via manual update of style in prototxt file.
Thanks.

Thanks you solved my problem too