ERROR: UFFParser: Parser error: BoxPredictor_0/Reshape: Reshape: -1 dimension specified more than 1 ...

I am trying to reproduce the ssd_inception_v2 model frozen graph as given in models zoo.

These are the exact steps I am following:

  • Training ssd_inception_v2 model with the following command:
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python object_detection/model_main.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --num_train_steps=${NUM_TRAIN_STEPS} \
    --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
    --alsologtostderr
  • Exporting the trained model for inference with the following command:
  • INPUT_TYPE=image_tensor
    PIPELINE_CONFIG_PATH={path to pipeline config file}
    TRAINED_CKPT_PREFIX={path to model.ckpt}
    EXPORT_DIR={path to folder that will be used for export}
    python object_detection/export_inference_graph.py \
        --input_type=${INPUT_TYPE} \
        --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
        --trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
        --output_directory=${EXPORT_DIR}
    

    The above command generates the frozen graph format of the exported model. The issue is - The custom_ssd_inception is way different from what is given in the ssd_inception_v2_coco_2018_01_28 file in model zoo of object detection when visualized using tensorboard.

    Link to files:

    Custom_ssd_inception (pb): https://down.uploadfiles.io/get/uvg4m
    Original_ssd_inception (pb): http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz

    When continued to convert the frozen graph to UFF using convert_to_uff.py, the output log files are as follows:

    For the original model, given in model zoo: http://txt.do/dw3z6
    For custom model: http://txt.do/dw3zj

    When I plugged in the UFF model exported from the custom model(the model which I trained), I get the following error:

    ERROR: UFFParser: Parser error: BoxPredictor_0/Reshape: Reshape: -1 dimension specified more than 1 time
    ERROR: sample_uff_ssd: Fail to parse

    Check the entire log here: http://txt.do/dw8b5

    How do I fix this issue?

    Hello, we are triaging this. Can you provide details on the platforms you are using?

    Linux distro and version
    GPU type
    nvidia driver version
    CUDA version
    CUDNN version
    Python version [if using python]
    Tensorflow version
    TensorRT version

    Hello,

    Did you follow the instructions in the sample README?

    Steps to generate UFF file:
        0. Make sure you have the UFF converter installed. For installation instructions, see:
            https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/#python and click on the 'TensorRT Python API' link.
    
        1. Get the pre-trained Tensorflow model (ssd_inception_v2_coco) from:
            http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz
    
        2. Call the UFF converter with the preprocessing flag set (-p [config_file]).
            The config.py script specifies the preprocessing operations necessary for SSD TF graph.
            It must be copied to the working directory for the file to be imported properly.
            The plugin nodes and plugin parameters used in config.py should match the registered plugins
            in TensorRT. Please read the plugins documentation for more details.
    
            'convert-to-uff --input-file frozen_inference_graph.pb -O NMS -p config.py'
    

    It may be informative to compare your custom model to the one in the sample. By adding a -t option to convert-to-uff you get a text file of the converted graph in order to compare them.

    Here are the details:
    Linux distro and version - Linux Ubuntu 16.04
    GPU type - GeForce GTX 1050 Ti
    nvidia driver version - 396.45
    CUDA version - 9.2
    CUDNN version - 7.1
    Python version [if using python] - 2.7.12
    Tensorflow version - 1.11.0
    TensorRT version - 4.0

    I have followed the instructions given in the README. When I export the custom trained model to frozen graph and compare it with the one in the model zoo, The two graphs(pb files) are way different. When I try to use the custom trained model’s frozen graph for inference using tensorflow, It works fine. But, It fails when I convert to UFF and plug it in for inference using TensorRT.(The files attached should give you detailed info).

    The logs after adding -t option to convert-to-uff:

    ssd300_inception (model zoo): https://ufile.io/w9ouv
    ssd300_inception (custom): https://ufile.io/r5ymv

    For custom one, When I enable -t option to convert-to-uff I get the following error:

    Traceback (most recent call last):
      File "convert_to_uff.py", line 110, in <module>
        main()
      File "convert_to_uff.py", line 105, in main
        output_filename=args.output
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 149, in from_tensorflow_frozen_model
        return from_tensorflow(graphdef, output_nodes, preprocessor, **kwargs)
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/conversion_helpers.py", line 120, in from_tensorflow
        name="main")
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 77, in convert_tf2uff_graph
        uff_graph, input_replacements)
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 64, in convert_tf2uff_node
        op, name, tf_node, inputs, uff_graph, tf_nodes=tf_nodes)
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 43, in convert_layer
        return cls.registry_[op](name, tf_node, inputs, uff_graph, **kwargs)
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter_functions.py", line 182, in convert_transpose
        tf_permutation_node).tolist()
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 122, in convert_tf2numpy_const_node
        np_dtype = cls.convert_tf2numpy_dtype(tf_node.attr['dtype'].type)
      File "/usr/lib/python2.7/dist-packages/uff/converters/tensorflow/converter.py", line 96, in convert_tf2numpy_dtype
        return np.dtype(dt[dtype])
    TypeError: data type "invalid" not understood
    

    And point to be noted is, I am not doing anything different here. I am training the model based on the given guidelines on COCO dataset to see if I can replicate the model zoo’s ssd_incpetion_v2_coco’s results. Where am I exactly going wrong?

    Hello,

    Our conversion script works for ssd_inception_v2_coco_2017_11_17.tar.gz version of the SSD trained TF model. Can you try it? (http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz)

    convert-to-uff maps namespaces (as specified by config.py) to custom plugin nodes so that they can be run on TensorRT. If the your model has different namespaces or different graph layout then the same config script may not work and has to be modified.

    Please checkout https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/graphsurgeon/graphsurgeon.html on how to do this and use sample as a reference.

    Just to be clear, did you use README #2’s pre-trained Tensorflow model (ssd_inception_v2_coco) from:
    http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz in either of your tests?

    ssd300_inception (model zoo): https://ufile.io/w9ouv
    ssd300_inception (custom): https://ufile.io/r5ymv

    I have no issues in running it with ssd_inception_v2_coco_2017_11_17.tar.gz version of the SSD trained TF model (from the model zoo).

    I understand that convert-to-uff maps namespaces (as specified by config.py) to custom plugin nodes so that they can be run on TensorRT. The whole issue boils down to why my model has different namespaces or different graph layout than one in the model zoo when I am not doing anything different from what is specified in the guidelines.

    ssd300_inception (model zoo): https://ufile.io/w9ouv is the one which is tested with pre-trained Tensorflow model (ssd_inception_v2_coco)

    I am training it on tensorflow, using the same feature extractor, same dataset, same hyperparameters too. but why are the graphs different? Why am I not able to reproduce the frozen graph given in the model zoo?

    Hello,

    you mentioned “he custom_ssd_inception is way different from what is given in the ssd_inception_v2_coco_2018_01_28 file in model zoo”. So that may result in different namespace and/or graph layout? As described in earlier post

    "…If the your model has different namespaces or different graph layout then the same config script may not work and has to be modified. "

    Please verify custom model has matching namespaces/layout as associated config.py

    I guess, You didn’t understand the issue properly.

    By “the custom_ssd_inception is way different from what is given in the ssd_inception_v2_coco_2018_01_28” I meant - After training the model using guidelines on coco dataset(using the same feature extractor, same dataset, same hyperparameters), exporting that model to frozen graph and comparing that graph (the one I trained) with the one which is in model zoo is different. Why are they different? I didn’t do any customizations. Trained the model, exported to the frozen graph on TF. Why is my frozen graph different from model zoo’s frozen graph?

    Hello,

    Understood. Would it be more appropriate to post your issue with exported frozen graph variance on the TensorFlow github? https://github.com/tensorflow/models/tree/master/research/object_detection

    I’m not sure TRT can answer “why my model has different namespaces or different graph layout than one in the model zoo”

    Some function calls change from version to version. If you want to reproduce the model from zoo, use the same tf version as they used to generate the .pb file.

    Tried with the same tf version but didn’t help. It is still different from what is given in the model zoo.

    It would be nice of TRT to give a sample code to convert to UFF and use it for inference with the frozen graphs exported from latest TF versions rather than some old static frozen graph files which users are unable to reproduce. Nevertheless, Thank you.

    Hey AdithyaP,
    I have been facing the exact same issue. Have you been able to find a solution to this?

    Thanks!

    Hello dl.roadtoai,

    No, I didn’t find a solution to this yet. I’ve posted this issue on tensorflow git issues.
    You can check the same here: https://github.com/tensorflow/models/issues/5640

    @AdithyaP I ran into this problem as well. I got around it by exporting my graph using an older version of tensorflow-models, commit hash ae0a9409212d0072938fa60c9f85740bb89ced7e

    @ketronmw’s answer + a couple extra steps worked for me:

    1. git checkout that commit hash within tensorflow/models
    2. Re-compile protocol files in tensorflow/models/research via
      protoc object_detection/protos/*.proto --python_out=.
      
    3. Comment out the following line in your pipeline.config file:
      # override_base_feature_extractor_hyperparams: true
      
    4. Re-export your trained model.

    Hi,

    I’m getting the below error while parsing the UFF.

    [TensorRT] ERROR: UffParser: Validator error: concat_box_loc: Unsupported operation _FlattenConcat_TRT

    I can see the operation in the .pbtxt file as below.

    nodes {
    id: “concat_box_loc”
    inputs: “BoxPredictor_0/Reshape”
    inputs: “BoxPredictor_1/Reshape”
    inputs: “BoxPredictor_2/Reshape”
    inputs: “BoxPredictor_3/Reshape”
    inputs: “BoxPredictor_4/Reshape”
    inputs: “BoxPredictor_5/Reshape”
    operation: “_FlattenConcat_TRT”

    Any help is appreciated.

    It seems that this can be worked around by specifying the batch size to the export_inference_graph.py script. Like so, for example, if the batch size is 1:

    --input_shape=1,-1,-1,3
    

    However, it still won’t work with the most recent version of the Object Detection API, but still, more recent versions than without the workaround.

    1 Like

    I reverted to the commit you mentioned but got the following error.

    ValueError: SSD Inception V2 feature extractor always usesscope returned by `conv_hyperparams_fn` for both the base feature extractor and the additional layers added since there is no arg_scope defined for the base feature extractor.
    

    And uncommenting the override_base_feature_extractor_hyperparams: true in config, I got

    TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)
    

    Can you mention the version of tensorflow and pretrained model you used.

    I reverted to the commit you mentioned but got the following error.

    ValueError: SSD Inception V2 feature extractor always usesscope returned by `conv_hyperparams_fn` for both the base feature extractor and the additional layers added since there is no arg_scope defined for the base feature extractor.
    

    And uncommenting the override_base_feature_extractor_hyperparams: true in config, I got

    TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)
    

    Can you mention the version of tensorflow and pretrained model you used.