Custom U-net model: convert, load and inference

Hi!

I’ve been working on a custom U-net model based on [url]https://arxiv.org/abs/1909.00166[/url] that uses Bidirectional LSTM layers in the skip connections. I’ve it trained in Keras on Tensorflow in my GPU. Everything fine.

I’ve been able to load it directly in the Jetson Nano, without any kind of optimization: simple Keras load_model from a saved one. But the inference time is around 2 seconds per frame, and the RAM usage ramps up until 3.7GB + 1GB of swap…

I’ve read about TensorRT and its integration with the Jetson Nano, so I have given it a try. But… No success at all.

I’ve followed several links, blogs and tutorials, but I always get some kind of error when transforming the model to trt, or uff, or when importing it on the Nano.

Is there a particular thing that I’m missing?

Thanks,

Hi,

We haven’t tried the ‘Bi-Directional ConvLSTM U-Net’ before.
Could you share the model (.pb/.onnx) or the error message with us?

By the way, may I know which backend frameworks do you use?
https://keras.io/backend/

Thanks.

Hi,

Thanks for your response.

The backend that I’ve been using is Tensorflow, so we could say I use ‘Keras on Tensorflow’.

I can share the .pb file, but I use some custom objects (mainly loss, accuracy and optimizer) that are not available by default. I could share the code also if needed, tell me something.

The error I get is different whether if I try to convert it from keras to trt, or from keras to tensorflow to uff to trt.

If I try to convert the frozen graph to uff format using (convert_to_uff.py), the error I get is because of the Conv2DLSTM layer:

uff.model.exceptions.UffException: Const node conversion requested, but node is not Const
name: "conv_lst_m2d_1/while/BiasAdd_2/Enter"
op: "Enter"
input: "conv_lst_m2d_1/strided_slice_10"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "frame_name"
  value {
    s: "conv_lst_m2d_1/while/while_context"
  }
}
attr {
  key: "is_constant"
  value {
    b: true
  }
}
attr {
  key: "parallel_iterations"
  value {
    i: 32
  }
}

I somehow managed to save it to trt using this code:

def freeze_graph(graph, session, output, save_pb_dir='.', save_pb_name='frozen_model.pb', save_pb_as_text=False):
    with graph.as_default():
        graphdef_inf = tf.graph_util.remove_training_nodes(graph.as_graph_def())
        graphdef_frozen = tf.graph_util.convert_variables_to_constants(session, graphdef_inf, output)
        graph_io.write_graph(graphdef_frozen, save_pb_dir, save_pb_name, as_text=save_pb_as_text)
        return graphdef_frozen

session.run(init_op)

input_names = [t.op.name for t in model.inputs]
output_names = [t.op.name for t in model.outputs]

# Prints input and output nodes names, take notes of them.
print(input_names, output_names)

frozen_graph = freeze_graph(session.graph, session, [out.op.name for out in model.outputs], save_pb_dir=save_pb_dir)

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

graph_io.write_graph(trt_graph, "./models/", "trt_graph.pb", as_text=False)

But when I load the new “trt_graph.pb” and try to predict with this:

def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""
    with tf.gfile.FastGFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    return graph_def


trt_graph = get_frozen_graph('./models/trt_graph.pb')
K.set_learning_phase(False)
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)

tf.import_graph_def(trt_graph, name='')

output_tensor = tf_sess.graph.get_tensor_by_name(output_tensor_name)
feed_dict = {
        input_tensor_name: image_sample
    }

predicted_mask = tf_sess.run(output_tensor, feed_dict)

The RAM memory usage raises to 3.7GB + 3GB of swap for a total of almost 7GB(!), the load time of the model is around 3 minutes, and the inference time is mantained at around 2 seconds per frame.

I don’t know if my model is simply to complex for Jetson Nano to be able to save it as uff (because of the Conv2DLSTM layers), but what I don’t understand at all is why after converting it to trt, the inference time is the same, but the RAM usage increases by 2 more GB.

Any kind of help will be appreciated, even if the answer is that the model is too complex.

Thanks!

Any updates on this?

My biggest concern now is the VERY HIGH RAM usage when loading any model into the Jetson Nano:

  • On a Raspberry Pi 4, the model I’ve been talking about reserves around 700MB of RAM and it takes around 10 seconds to make a prediction (pure CPU)
  • On the Jetson Nano, the same model reserves around 5GB of RAM and it takes around 2 seconds to make the prediction.

So, for a roughly 5x faster prediction, I’m reserving almost 8x times the RAM! And also, the loading time in the Nano is several times slower.

I’ve been thinking that I’m doing something wrong, but I’m totally out of options here.

Hi,

Sorry for the late update.

There are two possible way to use TensorRT: TF-TRT and pure TensorRT.

TF-TRT is used through the TensorFlow(Keras) interface, which still occupy lots of memory.
It’s known that TensorFlow might occupy 2x or more memory on GPU mode on Jetson platform.

It’s recommend to use pure TensorRT, which converts .pb → .uff → TRT engine.
Could you share the error when you converting .pb file into .uff? (or the error is from .uff → TRT engine?)

The bidirectional LSTM is supported in the TensorRT 5.1:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#rnnv2-layer

Thanks.