TensorRT in PX2 (4.0.0.8) behaves wrong about plugin layers (std::out_of_range)

I tried to minimalize the problem.

Below codes work fine with TensorRT 4 or 5 of Linux versions.
However, an error happens only with PX2 Auto Chauffeur (DRIVE OS 5.0.10.3 Linux SDK for DRIVE PX 2)

1. Simple Network and Frozen Graph (why.pb)
I defined a very minimal network. This generates why.pb.

import tensorflow as tf
slim = tf.contrib.slim

NAME = 'why'
IMAGE_HEIGHT = 5
IMAGE_WIDTH = 5
with tf.Graph().as_default():
    
    # very simple network
    image_ph = tf.placeholder(tf.float32, [1, IMAGE_HEIGHT, IMAGE_WIDTH, 3])
    net = slim.conv2d(image_ph, 3, [3, 3])
    net = slim.conv2d(net, 3, [3, 3])

    branches = []
    for i in range(2):
        with tf.variable_scope('branch_%d' % i):
            net_ = slim.conv2d(net, 3, [3, 3])
            net_ = tf.reshape(net_, [-1, 1])
            branches.append(net_)
    
    # just a simple plugin layer
    def merge(b1):
        return 0
    
    net = tf.py_func(merge, branches, tf.float32, name="output")
    net.set_shape((1))

# frozen graph    
    gpu_config = tf.ConfigProto(allow_soft_placement=True)
    gpu_config.gpu_options.allow_growth = True
    
    with tf.Session(config=gpu_config) as sess:
        
        init = tf.global_variables_initializer()
        sess.run(init)
    
        """ specify tensors that I will need when doing inference """
        output_names = ['output']
        output_tensors = [tf.get_default_graph().get_tensor_by_name(n + ":0") for n in output_names]
        
        graphdef = tf.get_default_graph().as_graph_def()
        frozen_graph = tf.graph_util.convert_variables_to_constants(sess, graphdef, output_names)
        frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)      
        tf.train.write_graph(frozen_graph, '.', NAME + '.pb', as_text=False)

2. Graph Surgery for UFF (why.uff)
For some reason, I would like to just skip reshaping layers (branch_0/Reshape, branch_1/Reshape), and define my own custom output layer, which has two inputs.

Below is the graph surgery file (uff-surgery-why.py) for UFF conversion.

import graphsurgeon as gs
import tensorflow as tf

conv_cls1 = gs.create_node("conv_cls1", op="reshape_to_4", dtype=tf.float32)
conv_cls2 = gs.create_node("conv_cls2", op="reshape_to_4", dtype=tf.float32)
output = gs.create_node("output", op="flatten", dtype=tf.float32)

namespace_plugin_map = {
    "output": output,
    "branch_0/Reshape": conv_cls1,
    "branch_1/Reshape": conv_cls2,
}

def preprocess(dynamic_graph):
    dynamic_graph.collapse_namespaces(namespace_plugin_map)

    def find_nodes(name):
        return [node for node in dynamic_graph._internal_graphdef.node if node.name == name]

    node = dynamic_graph.find_nodes_by_name('conv_cls1')
    dynamic_graph.forward_inputs(node)
    node = dynamic_graph.find_nodes_by_name('conv_cls2')
    dynamic_graph.forward_inputs(node)

I got UFF file by running below

convert-to-uff tensorflow --input-file why.pb -O output -p uff-surgery-why.py  # TensorRT 4
convert-to-uff --input-file why.pb -O output -p uff-surgery-why.py             # TensorRT 5

3. Inference
Then I tried to inference with why.uff with C++ API. The codes runs okay with TensorRT 4 or 5 on my laptop and two desktops (all Linux). However, if I run the codes on PX 2, an error happens in UFF parser as below.

# -----------------------
#  PX 2
# -----------------------
Begin parsing model...
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)

# -----------------------
#  My desktop, laptop, etc.
# -----------------------
Begin parsing model...
_flatten
Flatten::Flatten()
Flatten::getOutputDimensions()
nbInputDims 2
--input 0, (3, 5, 5, )
--input 1, (3, 5, 5, )
End parsing model...
Begin building engine...
Flatten::configure()
nbInputs 2
--input 0, (3, 5, 5, )
--input 1, (3, 5, 5, )
nbOutputs 1
--output 0, (3, 5, 5, )
Flatten::getWorkspaceSize()
Flatten::getWorkspaceSize()
Flatten::initialize()
End building engine...
Flatten::getSerializationSize()
Flatten::serialize()
Flatten::terminate()
Flatten::~Flatten()
*** deserializing
_flatten_HL_1804289383
Flatten::Flatten()
Flatten::initialize()
engine created
batch size: 1
nbBindings: 2
size of binding 0: 75
size of binding 1: 75
----------input binding 0
----------safe malloc 0, 75
----------safe malloc 1, 75
input image copy done.
 inference takes 2.68461 ms.

Could you please check this?

Hello, we are triaging and will keep you updated. Question, what is the GPU type on your linux host?

Hello,

UFF files are not platform specific and should work on PX2 as well.

per engineering, can we get the full repro including the C++ sample being used, plugin files.

Are you using plugin factory or plugin registry for using plugins with TensorRT?

You can DM me if you don’t want to upload in public.

Hi NVES,
Thanks for taking a look at this.

At the moment of posting, there was no answer, so I post this question here and there (PX 2 Forum).
Would you please look at:
https://devtalk.nvidia.com/default/topic/1044208/driveworks/px2-tensorrt-problem/
There are even more minimal examples. (pb, pb.uff, pb.uff.txt, graph-surgery.py, c++ inference codes).

By the way, I thought UFF files are plaform-independent and engines create from UFF are platform specific. Maybe I am wrong?

I am really looking forward to hearing back. Thank you so much!

FYI, I tested several Linux systems (a desktop with 4 TitanXs, a desktop with 1080, a laptop with 1080). All with TesorRT 4.0.1.6 (Of course, TensorRT 5 also work).

It doesn’t work with PX 2 (maybe TensorRT 4.0.0.3).

Additional Info! I also found inconsistency between Linux (RT 4.0.1.6) and PX 2.
This looks highly related. I tried to print FieldMap as below.

class PluginFactory : public nvinfer1::IPluginFactory, public nvuffparser::IPluginFactory
{
public:
    virtual nvinfer1::IPlugin* createPlugin(const char* layerName, const nvinfer1::Weights* weights, int nbWeights, const nvuffparser::FieldCollection fc) override
    {
        assert(isPlugin(layerName));

        const nvuffparser::FieldMap* fields = fc.fields;
        int nbFields = fc.nbFields;

        .....
        if (!strcmp(layerName, "_some_plugin")) {
            for(int i = 0; i < nbFields; i++)
            {
                const char* attr_name = fields[i].name;
                std::cout << i << " " << attr_name << std::endl;

A plugin layer

slice_xy = gs.create_node("some_plugin", dtype=tf.float32, channels=[1, 2])

Linux output

0 dtype                                                                                              
1 channels

PX2 output

0 channels_u_ilist
1 dtype

Maybe this inconsistency in field names finally results in “std::out_of_range”?
Hope this also helps you while you check the issue.

Hello,

Per engineering:
Can you try with

gs.create_plugin_node("some_plugin", dtype=tf.float32, channels=[1, 2])

as opposed to

gs.create_node("some_plugin", dtype=tf.float32, channels=[1, 2]).

That will append the type information to the plugin field type (which will be used by the parser to figure out the type). The parser strips out the type information and passes it to the createPlugin function.

Also, for this use case recommend to move to TRT 5.0 GA and use plugin registry and register the plugin.

Hi NVES, thanks for the reply. However, please consider these things:

  1. The whole point here is to use Drive PX 2, where its latest OS only includes TensorRT 4.
  2. In TensorRT 4, (as I remember) there was no function “create_plugin_node”.
  3. In TensorRT 4.0.1.6 (Linux), “create_node” works enough anyway. Ony Drive PX 2’s TensorRT has an issue.

If I could use TensorRT 4.0.1.6 or TensorRT 5, I am already fully aware that there is no issue :)
The question is why Drive PX 2 has such issue, and how I can solve this.

hello pualkwon,

Understood.

Per engineering: can you check what is the output of the FieldType along with the other plugin field entries?
This is what the parser does if it doesn’t recognize the field type and I feel that since it cannot deduce the type it is not able to allocate the memory correctly

Hello,

per engineering: GraphSurgeon is platform independent. Is it possible that the user has different versions of GraphSurgeon installed on the Linux system vs PX2?

Can you try if you use the UFF file generated on Linux on the PX2 rather than converting on the board directly?

Hi NVES,

I also suspected this could be due to different versions of GraphSurgeon between Linux and PX2. Note that I generated the UFF on Linux (desktop), and it worked for desktop TensorRT.

Since PX2’s Tensor RT doesn’t parse the UFF, I was going to generate the UFF on PX2 for the consistency. However, I realized that PX2’s TensorRT doesn’t have any python APIs, the “convert-to-uff” script, and sampleUffSSD example. PX2’s TensoRT version seemed quite much lower than the downloadable 4.0.1.6 (for general Linux systems).

A related question would be: do you think I can generate UFF directly on the board?
A related question would be: why does the “latest” PX 2 OS have that old TensorRT?

(FYI, earlier I posted about an versioning question in here:
https://devtalk.nvidia.com/default/topic/1044167/general/a-few-questions-about-tensorrt-versioning/ )

Hello,

Apologize for the delay, I have some updates.

Engineering has fixed this issue and should be available in a future TRT release, which will quickly be into DriveOS.

In the meantime, as a Work Around, our engineers recommend designing the plugin layer such that the weights to the plugin are const nodes. Looks like the issue is happening because the parser is trying to convert non-const nodes to weights and it cannot handle that.

In the case of the customer plugin,

`slice_xy = gs.create_node("some_plugin", dtype=tf.float32, channels=[1, 2])`

, our engineers recommend trying with the channels field actually being a constant layer input and removing the dtype filed.
This is handled much better in later versions of the Uff Parser, but for 4.0 we have some limitations with plugins.

regards,
NVIDIA Enterprise Support