model accuracy penalty with tensorRT on jetson TX2

Hi,

Im using TensorRT for inference of a keras model on tensorRT on a Jetson TX2 device.
The model outputs are different when compared to the outputs of the keras model, leading to lower accuracy.

cuda version 9.0.252
tensorrt version 4.1.3

(on host) model is taken from: https://github.com/uzh-rpg/rpg_public_dronet/tree/master/model

(on host) keras to tensorflow pb translation is done with: https://github.com/amir-abdi/keras_to_tensorflow

pb to uff is done with: https://pastebin.com/UcuvHV1k (adapted from an nvidia samples)
engine creation and inference code: https://pastebin.com/EE2Ns6cp (adapted from nvidia UFF MNIST sample)

I suspected that the data ordering was the source for the issue (CHW vs HWC), but since im using only grayscale images it would not make that much difference, so in the end im not sure what is the issue.
perhaps TRT is doing some optimizations that impact the accuracy?

Thanks
Sagiv

*this is a repost of this thread, which did not get any responses… https://devtalk.nvidia.com/default/topic/1055217/tensorrt/model-accuracy-penalty-with-tensorrt-on-jetson-tx2/

Hi,

The source code link is somehow broken. Would you mind to check it again.

It’s recommended to check the image pre-processing step first.
In MNIST sample, we handle image by substractin mean from the PGM file.
This may not be identical to the keras process you used.

Thanks.

Hi AaastaLLL,

Thanks for reply.
Which of the links are broken? They all seem to be OK on my end.

I actually did several tests to verify that the images are sent in the exact same way. just to be sure, i sent entirely blank (white) images to both networks and got different results.

What else can I do to find the issue?

Hi,

I cannot open the link in pastebin.
Both https://pastebin.com/UcuvHV1k and https://pastebin.com/EE2Ns6cp.

Would you mind to share all the source to the GitHub or directly attached to the comment?
Thanks.

Here is the TF to UFF translation:

import tensorflow as tf
import sys
from tensorflow.python.platform import gfile

from tensorflow.core.protobuf import saved_model_pb2
from tensorflow.python.util import compat
import uff

UFF_OUTPUT_FILENAME = 'model_tensorrt.uff'

#OUTPUT_NAMES = ["output_names"]

with tf.Session() as persisted_sess:
  print("load graph")
  with gfile.FastGFile("../rpg_public_dronet/model/model_tensorflow.pb",'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    persisted_sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')
    writer = tf.summary.FileWriter("./tf_summary", graph=persisted_sess.graph)
    # Print all operation names
    #for op in persisted_sess.graph.get_operations():
    #  print(op)



import tensorrt as trt
from tensorrt.parsers import uffparser

G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)


# Load your newly created Tensorflow frozen model and convert it to UFF
uff_model = uff.from_tensorflow_frozen_model("../rpg_public_dronet/model/model_tensorflow.pb", ["activation_8/Sigmoid", "dense_1/BiasAdd"], output_filename=UFF_OUTPUT_FILENAME)
uff.from_tensorflow(graphdef=frozen_graph,
                        output_filename=UFF_OUTPUT_FILENAME,
                        output_nodes=OUTPUT_NAMES,
                        text=True)

# Create a UFF parser to parse the UFF file created from your TF Frozen model
#parser = uffparser.create_uff_parser()
#parser.register_input("input_1", (200,200,1),0)
#parser.register_output("activation_8/Sigmoid")

and the engine creation and inference functions:

int inference(void* p_engine, void* p_context, float *input_img, float output_arr[NUM_OF_OUTPUTS])
{
  /* 
   * Get an image buffer ready for inference and run the NN on it.
   * The image is expected to be AFTER all preprocessing steps -
   *  croping, resizing, rescale and normalization (unless this is done by batchnorm).
   */
  LOG("TRTLib: clearing output array\n");
  memset(output_arr, 0, (sizeof(float) * NUM_OF_OUTPUTS));
  
  LOG("TRTLib: assigning from input pointers\n");
  
  ICudaEngine &engine = *((ICudaEngine*)p_engine);
  IExecutionContext* context = (IExecutionContext*)p_context;
  

  LOG("TRTLib: getting bindings from engine\n");
  int batchSize = 1;

  int nbBindings = engine.getNbBindings();
  assert(nbBindings == TOTAL_BINDINGS);

  std::vector<void*> buffers(nbBindings);
  auto buffersSizes = calculateBindingBufferSizes(engine, nbBindings, batchSize);

  int bindingIdxInput = 0;
  for (int i = 0; i < nbBindings; ++i)
  {
    if (engine.bindingIsInput(i))
    {
      bindingIdxInput = i;
    }
    else
    {
      auto bufferSizesOutput = buffersSizes[i];
      buffers[i] = safeCudaMalloc(bufferSizesOutput.first *
                                  elementSizeTrt(bufferSizesOutput.second));
    }
  }

  auto bufferSizesInput = buffersSizes[bindingIdxInput];

  LOG("TRTLib: creating buffer for input \n");

  buffers[bindingIdxInput] = createImageCudaBuffer(bufferSizesInput.first,
                                                   bufferSizesInput.second, input_img);

  LOG("TRTLib: executing inference\n");

  LOG("TRTLib: moving output from GPU to host\n");

  int output_idx = 0;
  for (int bindingIdx = 0; bindingIdx < nbBindings; ++bindingIdx)
  {
    float output;
    
    if (engine.bindingIsInput(bindingIdx))
      continue;

    auto bufferSizesOutput = buffersSizes[bindingIdx];
    output = getOutputs(bufferSizesOutput.first, bufferSizesOutput.second,
                        buffers[bindingIdx], bindingIdx);
    
    LOG("assigning output %f in array slot %d\n", output, output_idx);
    output_arr[output_idx++] = output;
  }
  
  LOG("TRTLib: clean GPU mem\n");
  
  CHECK(cudaFree(buffers[bindingIdxInput]));

  for (int bindingIdx = 0; bindingIdx < nbBindings; ++bindingIdx)
    if (!engine.bindingIsInput(bindingIdx))
      CHECK(cudaFree(buffers[bindingIdx]));
      
  
  LOG("TRTLib: DONE\n");
  
  return 0;
}


int build_engine(std::string uff_path, uint8_t input_shape[2], void** out_engine, void** out_context)
{
  /*
   * This function will prepare a tensorRT engine, ready for inference jobs.
   * It should be called only once per NN.
   * 
   * @uff_path    : Full path to .uff model file.
   *                Note that this is not completely flexible, as input/output
   *                   size/names are hardcoded in the 'trtinference.h' file.
   * @input_shape : Integer array for input image size. should be [Height, Width].
   *                Only grayscale images (single channel) are supported now.
   */
  *out_engine = NULL;
  *out_context = NULL;
  
  LOG("TRTlib: %s\n", uff_path.c_str());
  LOG("TRTlib: %u,%u\n", input_shape[0], input_shape[1]);

  int maxBatchSize = 1;
  auto parser = createUffParser();

  INPUT_H = input_shape[0];
  INPUT_W = input_shape[1];

  /* Register tensorflow input */
  parser->registerInput(INPUT_BINDING_NAME,
                        Dims3(INPUT_C, INPUT_H, INPUT_W),
                        UffInputOrder::kNCHW);
  parser->registerOutput(OUTPUT_1_BINDING_NAME);
  parser->registerOutput(OUTPUT_2_BINDING_NAME);

  ICudaEngine* engine = loadModelAndCreateEngine(uff_path.c_str(), maxBatchSize, parser);

  if (!engine) {
    std::cout << "Failed to create engine" << std::endl;
    return -1;
  }

  /* we dont need to keep the memory created by the parser */
  parser->destroy();
  
  IExecutionContext* context = engine->createExecutionContext();

  *out_engine = (void*)engine;
  *out_context = (void*)context;

  return 0;
}

I will soon have the full source code on github, and will update here when it is up.
can you see any issue with what I put here?

BR
Sagiv

Hi,

It looks like you comment the output layer name.

#OUTPUT_NAMES = ["output_names"]

This introduces some error.
The uff parser generate the layer between input and output with the topological sort.
Input is automatically detected but the output is an user variable for flexibility.

If the output name is not correct, the sorting result will be unpredictable.
Thanks.

Hi AastaLLL,

Sorry for the late reply. You were correct with your answer regarding the code i uploaded, but it does not solve my issue, as I accidentally pasted an incorrect version of the file - I do not use uff.from_tensorflow() function to load the PB model.
I actually only use uff.from_tensorflow_frozen_model() to load the PB model (and save it immediately to UFF_OUTPUT_FILENAME). this is working correctly and as you can see in the updated version (below), OUTPUT_NAMES is never used.
please note that in build_engine() function (which is actually loading the UFF file saved earlier, i use the OUTPUT_1_BINDING_NAME, OUTPUT_2_BINDING_NAME constants, which are defined as:

#define INPUT_BINDING_NAME "input_1"
#define OUTPUT_1_BINDING_NAME "activation_8/Sigmoid"
#define OUTPUT_2_BINDING_NAME "dense_1/BiasAdd"

and the input given for uff_path is the same UFF_OUTPUT_FILENAME as before.

To further analyse the issue, i sent black (all-white/all-black) images to both regular and tensorRT model without the encompassing code, and the results were unrelated… i take from this that something is wrong with either the tensorRT version itself, as both got exactly the same data… what do you think?

It would be great to get your input in this still-valid issue…

Thanks,
Sagiv

updated TF to UFF file:

import tensorflow as tf
import sys
from tensorflow.python.platform import gfile

from tensorflow.core.protobuf import saved_model_pb2
from tensorflow.python.util import compat
import uff

UFF_OUTPUT_FILENAME = 'model_tensorrt.uff'


with tf.Session() as persisted_sess:
  print("load graph")
  with gfile.FastGFile("../rpg_public_dronet/model/model_tensorflow.pb",'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    persisted_sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')
    writer = tf.summary.FileWriter("./tf_summary", graph=persisted_sess.graph)




import tensorrt as trt
from tensorrt.parsers import uffparser

G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)


# Load your newly created Tensorflow frozen model and save it to UFF
uff_model = uff.from_tensorflow_frozen_model("../rpg_public_dronet/model/model_tensorflow.pb", ["activation_8/Sigmoid", "dense_1/BiasAdd"], output_filename=UFF_OUTPUT_FILENAME)