Parsing Caffe model with LSTM layer

Good morning,

I am trying to convert a Caffe model in TensorRT.
However, the Caffe Parser does not support LSTM layer. On the other hand, TensorRT has its own LSTM layer
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt_302/tensorrt-developer-guide/index.html#layers

My questions therefore is: is it possible to parse the Caffe model and adding the LSTM layer in some way, or I have to write a custom IPlugin for that layer?

Thank you!

Hi Daniele,

Unfortunately, the nvidia caffe parser isn’t going to help you here. You’re going to have to write your own. Parsing the caffe lstm layers into TensorRT is a little tricky (I’ve done it), but it’s not impossible. I’d suggest you look at the builder API from TensorRT and look at the Caffe implementation to see how that works.

Good luck,
Tom

Hi Tom,

thanks for your answer. I expected that, but I had some hope…

By the way, I had some experience in writing custom IPlugin, but I have only handle simple layers, i.e. sigmoid and reshape. I am a bit scared about the LSTM.

I was thinking to start from the very Caffe Implementation, https://github.com/BVLC/caffe/blob/master/src/caffe/layers/lstm_layer.cpp

but I’d rather avoid including all the Caffe directories.

May I ask you what was your experience? You developed the layer based on the link above and using other caffe layers, or did you write the whole layer step by step?

Thank you for your answer!

Daniele

Hi Daniele,

If I get some time, I’ll try to provide more details, but for now I’ll try to sketch my approach.

I didn’t include any Caffe directories, but I drew inspiration from their implementation. Caffe itself doesn’t have a “pure” LSTM layer, instead they implement it from other layers. I’d suggest taking a look at the graph they produce for LSTMs. In order to do this for TensorRT, I used their “builder” API and preprocessed by prototxt to expand the LSTM node into a graph somewhat similar to the Caffe thing (this involved inserting a bunch of layers: a reshape, a “delta product”, an eltwise, a tile, and maybe others). Tile is not supported in TRT so I wrote a custom layer (simple implementations are not terribly difficult). One further trickiness is handling the state of the LSTM – you have to connect certain outputs (of the LSTM layer) back as inputs (to the LSTM layer) in order to set up the recurrent structure. I managed that in our own wrapper of the TRT network.

Good luck,
Tom

Hi Tom,

thank you for you answer. I appreciate your effort.

What worries me the most is the inserting of bunch of layers. In my previous experience, writing the custom IPlugin of a layer not supported by the Caffe parser was straightforward, i.e. writing a ad-hoc Softmax IPlugin. ayer not supported by the Caffe parse was straightforward, i.e. writing a ad-hoc Softmax IPlugin.

Because for how I understand this IPlugin customization:

  • You parse the prototxt file
  • Ops, you found a non-supported layer. Ok, so let’s write a custom IPlugin layer.

So I have a LSTM layer in the prototxt, with certain inputs and outputs. The custom plugin should be of the same type. I understand that the LSTM cell has operation that need to be modeled.

But if I understand correclty about the process of expanding the graph, you suggest to modify the structure of the network itself, i.e. modifying the LSTM in the prototxt by decomposing it in its subparts and THEN parse the model, where you have written custom implementation of the non-supported sublayers?

Thank you a lot, I feel I am getting close to what I have to do.

Daniele

Hi Daniele,

But if I understand correclty about the process of expanding the graph, you suggest to modify the structure of the network itself, i.e. modifying the LSTM in the prototxt by decomposing it in its subparts and THEN parse the model, where you have written custom implementation of the non-supported sublayers?

That is indeed how I did this – it’s certainly not the only possible approach. Just a little surgery on a graph. Caffe does their own surgery too.

Here are some diagrams which showed the local transformation I did: https://gist.github.com/tdp2110/f612f7007f7fa9fc81104affe63da4c0

(By the way, @NVES, your upload image feature doesn’t work)

The delta input for me I believe was a scalar for clearing the memory. The second, larger, image shows my processed proto (which I did in memory), and only in that network does the lstm node correspond to a network->addRNNv2 call. The double underscores in the diagram signify nodes which were not present in the original proto. The tile is a custom layer and is not too hard to implement.

As I mentioned before, unlike Caffe, the TRT RNN layer does not manage the recurrent connections for you: you have to feed the cell/hidden outputs back as inputs (eg using some sort of cudaMemcpy) after your calls to forward on your network. (You’ll also need to initialize those lstm cell/hidden inputs as well, probably to zero). This is probably why their nvcaffe_parser doesn’t handle this type of layer.

I wish I could show you some code, but what I have is part of a proprietary codebase, and involves our own wrapper around the TRT network (for example, to manage recurrent connections), which would probably be more distracting than helpful.

Parsing the weights is its own challenge too, but not as difficult as this other crap :)

Good luck,
Tom

I realize this is an ancient post at this point (more than 1 year old), but another approach (which I personally have not used) may be to use the onnx-tensorrt parser, if you can convert your model to ONNX. This parser does know how to import RNN layers, but it still might need a bit of TLC on your part. I don’t know how this would deal with Caffe-style “Tile” layers, but plugin layers might be able to help with those. You may also need to do a little manual work to deal with the recurrent connections, but maybe not.