TensorRT in PX2 (4.0.0.8) behaves wrong about plugin layers (std::out_of_range)

I tried to minimalize the problem.

Below codes work fine with TensorRT 4 or 5 of Linux versions.
However, an error happens only with PX2 Auto Chauffeur (DRIVE OS 5.0.10.3 Linux SDK for DRIVE PX 2)

1. Simple Network and Frozen Graph (why.pb)
I defined a very minimal network. This generates why.pb.

import tensorflow as tf
slim = tf.contrib.slim

NAME = 'why'
IMAGE_HEIGHT = 5
IMAGE_WIDTH = 5
with tf.Graph().as_default():
    
    # very simple network
    image_ph = tf.placeholder(tf.float32, [1, IMAGE_HEIGHT, IMAGE_WIDTH, 3])
    net = slim.conv2d(image_ph, 3, [3, 3])
    net = slim.conv2d(net, 3, [3, 3])

    branches = []
    for i in range(2):
        with tf.variable_scope('branch_%d' % i):
            net_ = slim.conv2d(net, 3, [3, 3])
            net_ = tf.reshape(net_, [-1, 1])
            branches.append(net_)
    
    # just a simple plugin layer
    def merge(b1):
        return 0
    
    net = tf.py_func(merge, branches, tf.float32, name="output")
    net.set_shape((1))
    

    # frozen graph    
    gpu_config = tf.ConfigProto(allow_soft_placement=True)
    gpu_config.gpu_options.allow_growth = True
    
    with tf.Session(config=gpu_config) as sess:
        
        init = tf.global_variables_initializer()
        sess.run(init)
    
        """ specify tensors that I will need when doing inference """
        output_names = ['output']
        output_tensors = [tf.get_default_graph().get_tensor_by_name(n + ":0") for n in output_names]
        
        graphdef = tf.get_default_graph().as_graph_def()
        frozen_graph = tf.graph_util.convert_variables_to_constants(sess, graphdef, output_names)
        frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)      
        tf.train.write_graph(frozen_graph, '.', NAME + '.pb', as_text=False)

2. Graph Surgery for UFF (why.uff)
For some reason, I would like to just skip reshaping layers (branch_0/Reshape, branch_1/Reshape), and define my own custom output layer, which has two inputs.

Below is the graph surgery file (uff-surgery-why.py) for UFF conversion.

import graphsurgeon as gs
import tensorflow as tf

conv_cls1 = gs.create_node("conv_cls1", op="reshape_to_4", dtype=tf.float32)
conv_cls2 = gs.create_node("conv_cls2", op="reshape_to_4", dtype=tf.float32)
output = gs.create_node("output", op="flatten", dtype=tf.float32)

namespace_plugin_map = {
    "output": output,
    "branch_0/Reshape": conv_cls1,
    "branch_1/Reshape": conv_cls2,
}


def preprocess(dynamic_graph):
    dynamic_graph.collapse_namespaces(namespace_plugin_map)

    def find_nodes(name):
        return [node for node in dynamic_graph._internal_graphdef.node if node.name == name]

    node = dynamic_graph.find_nodes_by_name('conv_cls1')
    dynamic_graph.forward_inputs(node)
    node = dynamic_graph.find_nodes_by_name('conv_cls2')
    dynamic_graph.forward_inputs(node)

I got UFF file by running below

convert-to-uff tensorflow --input-file why.pb -O output -p uff-surgery-why.py  # TensorRT 4
convert-to-uff --input-file why.pb -O output -p uff-surgery-why.py             # TensorRT 5

3. Inference
Then I tried to inference with why.uff with C++ API. The codes runs okay with TensorRT 4 or 5 on my laptop and two desktops (all Linux). However, if I run the codes on PX 2, an error happens in UFF parser as below.

# -----------------------
#  PX 2
# -----------------------
Begin parsing model...
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

# -----------------------
#  My desktop, laptop, etc.
# -----------------------
Begin parsing model...
_flatten
Flatten::Flatten()
Flatten::getOutputDimensions()
nbInputDims 2
--input 0, (3, 5, 5, )
--input 1, (3, 5, 5, )
End parsing model...
Begin building engine...
Flatten::configure()
nbInputs 2
--input 0, (3, 5, 5, )
--input 1, (3, 5, 5, )
nbOutputs 1
--output 0, (3, 5, 5, )
Flatten::getWorkspaceSize()
Flatten::getWorkspaceSize()
Flatten::initialize()
End building engine...
Flatten::getSerializationSize()
Flatten::serialize()
Flatten::terminate()
Flatten::~Flatten()
*** deserializing
_flatten_HL_1804289383
Flatten::Flatten()
Flatten::initialize()
engine created
batch size: 1
nbBindings: 2
size of binding 0: 75
size of binding 1: 75
----------input binding 0
----------safe malloc 0, 75
----------safe malloc 1, 75
input image copy done.
 inference takes 2.68461 ms.

Could you please check this?

Hi Paul,

thanks for your information and the well prepared code.

I will try to reproduce your issue and come back to you as soon as I have a solution for it.

  • Fabian

Hi Fabian,

Thanks for taking this issue. I’m looking forward to hearing back.

Btw, as you said, this issue is the same one with the https://devtalk.nvidia.com/default/topic/1044208/driveworks/px2-tensorrt-problem/ (the one you closed.) However, please note that, in that posting, I made an even more minimal case, and attached relevant files (.pb, .pb.uff, uff.pbtxt, graph-surgery-python file, C++ API inference codes) for you.

Additional Info! I also found inconsistency between Linux (RT 4.0.1.6) and PX 2.
This looks highly related. I tried to print FieldMap as below.

class PluginFactory : public nvinfer1::IPluginFactory, public nvuffparser::IPluginFactory
{
public:
    virtual nvinfer1::IPlugin* createPlugin(const char* layerName, const nvinfer1::Weights* weights, int nbWeights, const nvuffparser::FieldCollection fc) override
    {
        assert(isPlugin(layerName));

        const nvuffparser::FieldMap* fields = fc.fields;
        int nbFields = fc.nbFields;

        .....
        if (!strcmp(layerName, "_some_plugin")) {
            for(int i = 0; i < nbFields; i++)
            {
                const char* attr_name = fields[i].name;
                std::cout << i << " " << attr_name << std::endl;

A plugin layer

slice_xy = gs.create_node("some_plugin", dtype=tf.float32, channels=[1, 2])

Linux output

0 dtype                                                                                              
1 channels

PX2 output

0 channels_u_ilist
1 dtype

Maybe this inconsistency in field names finally results in “std::out_of_range”?
Hope this also helps you while you check the issue.

Hello,

UFF files are not platform specific and should work on PX2 as well.

thanks for the C++ source. We are triaging now.

Hi Paul,

as written here https://devtalk.nvidia.com/default/topic/1044167/general/a-few-questions-about-tensorrt-versioning/ the features from TensorRT 4.1.1 to 4.1.2 are not supported on DPX2 so far.

On AGX the new samples and multi-layer samples are fully supported.

  • Fabian