Parsing Caffe model with LSTM layer

daniele.bellandb · March 26, 2019, 11:29am

Good morning,

I am trying to convert a Caffe model in TensorRT.
However, the Caffe Parser does not support LSTM layer. On the other hand, TensorRT has its own LSTM layer
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt_302/tensorrt-developer-guide/index.html#layers

My questions therefore is: is it possible to parse the Caffe model and adding the LSTM layer in some way, or I have to write a custom IPlugin for that layer?

Thank you!

tom.petersy1wb7 · April 4, 2019, 8:30pm

Hi Daniele,

Unfortunately, the nvidia caffe parser isn’t going to help you here. You’re going to have to write your own. Parsing the caffe lstm layers into TensorRT is a little tricky (I’ve done it), but it’s not impossible. I’d suggest you look at the builder API from TensorRT and look at the Caffe implementation to see how that works.

Good luck,
Tom

daniele.bellandb · May 17, 2019, 7:25am

Hi Tom,

thanks for your answer. I expected that, but I had some hope…

By the way, I had some experience in writing custom IPlugin, but I have only handle simple layers, i.e. sigmoid and reshape. I am a bit scared about the LSTM.

I was thinking to start from the very Caffe Implementation, caffe/lstm_layer.cpp at master · BVLC/caffe · GitHub

but I’d rather avoid including all the Caffe directories.

May I ask you what was your experience? You developed the layer based on the link above and using other caffe layers, or did you write the whole layer step by step?

Thank you for your answer!

Daniele

tom.petersy1wb7 · May 17, 2019, 9:34pm

Hi Daniele,

If I get some time, I’ll try to provide more details, but for now I’ll try to sketch my approach.

I didn’t include any Caffe directories, but I drew inspiration from their implementation. Caffe itself doesn’t have a “pure” LSTM layer, instead they implement it from other layers. I’d suggest taking a look at the graph they produce for LSTMs. In order to do this for TensorRT, I used their “builder” API and preprocessed by prototxt to expand the LSTM node into a graph somewhat similar to the Caffe thing (this involved inserting a bunch of layers: a reshape, a “delta product”, an eltwise, a tile, and maybe others). Tile is not supported in TRT so I wrote a custom layer (simple implementations are not terribly difficult). One further trickiness is handling the state of the LSTM – you have to connect certain outputs (of the LSTM layer) back as inputs (to the LSTM layer) in order to set up the recurrent structure. I managed that in our own wrapper of the TRT network.

Good luck,
Tom

daniele.bellandb · May 20, 2019, 7:29am

Hi Tom,

thank you for you answer. I appreciate your effort.

What worries me the most is the inserting of bunch of layers. In my previous experience, writing the custom IPlugin of a layer not supported by the Caffe parser was straightforward, i.e. writing a ad-hoc Softmax IPlugin. ayer not supported by the Caffe parse was straightforward, i.e. writing a ad-hoc Softmax IPlugin.

Because for how I understand this IPlugin customization:

You parse the prototxt file
Ops, you found a non-supported layer. Ok, so let’s write a custom IPlugin layer.

So I have a LSTM layer in the prototxt, with certain inputs and outputs. The custom plugin should be of the same type. I understand that the LSTM cell has operation that need to be modeled.

But if I understand correclty about the process of expanding the graph, you suggest to modify the structure of the network itself, i.e. modifying the LSTM in the prototxt by decomposing it in its subparts and THEN parse the model, where you have written custom implementation of the non-supported sublayers?

Thank you a lot, I feel I am getting close to what I have to do.

Daniele

tom.petersy1wb7 · May 20, 2019, 4:11pm

Hi Daniele,

But if I understand correclty about the process of expanding the graph, you suggest to modify the structure of the network itself, i.e. modifying the LSTM in the prototxt by decomposing it in its subparts and THEN parse the model, where you have written custom implementation of the non-supported sublayers?

That is indeed how I did this – it’s certainly not the only possible approach. Just a little surgery on a graph. Caffe does their own surgery too.

Here are some diagrams which showed the local transformation I did: tensorrt-lstm-parsing · GitHub

(By the way, @NVES, your upload image feature doesn’t work)

The delta input for me I believe was a scalar for clearing the memory. The second, larger, image shows my processed proto (which I did in memory), and only in that network does the lstm node correspond to a network->addRNNv2 call. The double underscores in the diagram signify nodes which were not present in the original proto. The tile is a custom layer and is not too hard to implement.

As I mentioned before, unlike Caffe, the TRT RNN layer does not manage the recurrent connections for you: you have to feed the cell/hidden outputs back as inputs (eg using some sort of cudaMemcpy) after your calls to forward on your network. (You’ll also need to initialize those lstm cell/hidden inputs as well, probably to zero). This is probably why their nvcaffe_parser doesn’t handle this type of layer.

I wish I could show you some code, but what I have is part of a proprietary codebase, and involves our own wrapper around the TRT network (for example, to manage recurrent connections), which would probably be more distracting than helpful.

Parsing the weights is its own challenge too, but not as difficult as this other crap :)

Good luck,
Tom

tom.petersy1wb7 · June 4, 2020, 6:28pm

I realize this is an ancient post at this point (more than 1 year old), but another approach (which I personally have not used) may be to use the onnx-tensorrt parser, if you can convert your model to ONNX. This parser does know how to import RNN layers, but it still might need a bit of TLC on your part. I don’t know how this would deal with Caffe-style “Tile” layers, but plugin layers might be able to help with those. You may also need to do a little manual work to deal with the recurrent connections, but maybe not.

Topic		Replies	Views
customer layer questions Jetson Nano	21	1020	October 18, 2021
Generate TensorRT from Caffemodel Error DriveWorks	5	740	October 12, 2021
TensorRT - "could not parse layer type IPlugin" ?? TensorRT	6	1685	April 9, 2019
how to support caffe tile layer? TensorRT	4	1351	October 12, 2021
How to read layer param. with TensorRT Plugin from .prototxt DeepStream SDK	2	1671	January 23, 2018
caffeToGIEModel() segmentation fault Jetson TX1	9	1554	October 18, 2021
How to transfer LSTM caffemodel to TensorRT weights Announcements	0	905	June 28, 2019
Convert a Caffe model to TensorRT 5.0 TensorRT	6	2729	November 1, 2019
TensorRT 4 - Problem with data layer when running Caffe from Digits with TRT TensorRT	4	774	October 12, 2021
How to transfer LSTM caffemodel to TensorRT weights DeepStream SDK	2	945	May 5, 2020

Parsing Caffe model with LSTM layer

Related topics