Dear All,

I’m trying to use cudnnRNNForwardInference() for LSTM network inference.

It appears cudnnRNNMode_t = CUDNN_LSTM supports only LSTM w/o peephole connections.

In that, I have few questions below.

Q1: Are there ways to massage LSTM network with peephole connections in this CUDNN_LSTM mode? Any workarounds?

Q2: How would you handle LSTM with a recurrent projection layer + optional non-recurrent projection layer?

Best regards, Hak

Note: the below is true as of cuDNN v6. It may or may not be true for future versions of cuDNN.

Taking the implementation of peepholes from: https://arxiv.org/abs/1503.04069 (Appendix A)

If you break the LSTM into single steps then you can do the peephole calculations manually, and then pass them in as “biases”. You’d have to maintain the peephole weights yourself and use the cy values provided by cuDNN.

Unfortunately splitting the LSTM into single steps and calling additional kernels does reduce the expected performance somewhat.

I’m not sure what you mean by “non-recurrent” projection layer, but I guess that can be called outside of cuDNN without any issues. It may be you need to call cuDNN one layer at a time.

Recurrent projection layers are tricky. With heavy tweaking you can get them to work. It involves splitting the problem into one call per timestep again and doing the projection manually. Because the recurrent matrices is no longer square, you can no longer use them so have to somehow feed the hidden state through the layer-to-layer connection. This is possible, but as before, the more tricks like this the less performance gain you’d expect to see over simpler implementations.

Thanks for the response. In using for **cudnnRNNMode_t = CUDNN_LSTM**, **cudnnRNNForwardInference** routine executes LSTM in the following equations…

i(t) = σ(Wi*x(t) + Ri*h(t-1) + bWi + bRi)

f(t) = σ(Wf*x(t) + Rf*h(t-1) + bWf + bRf)

o(t) = σ(Wo*x(t) + Ro*h(t-1) + bWo + bRo)

c’(t) = tanh(Wc*x(t) + Rc*h(t-1) + bWc + bRc)

c(t) = f(t)◦c(t-1) + i(t)◦c’(t)

h(t) = o(t)◦tanh(c(t))

Q: In that, how would one match trained parameters with **(x, hx, cx, w, y, hy, cy)** argument list? Would it be something like **w=[Wi Ri bWi bRi Wf Rf bWf bRf Wo Ro bWo bRo Wc Rc bWc bRc], hx = h(t-1), cx = c(t-1), y = o(t), hy = h(t), cy = c(t)**?

Thanks for the response. In using for **cudnnRNNMode_t = CUDNN_LSTM**, **cudnnRNNForwardInference** routine executes LSTM in the following equations…

i(t) = σ(Wi*x(t) + Ri*h(t-1) + bWi + bRi)

f(t) = σ(Wf*x(t) + Rf*h(t-1) + bWf + bRf)

o(t) = σ(Wo*x(t) + Ro*h(t-1) + bWo + bRo)

c’(t) = tanh(Wc*x(t) + Rc*h(t-1) + bWc + bRc)

c(t) = f(t)◦c(t-1) + i(t)◦c’(t)

h(t) = o(t)◦tanh(c(t))

Q1: In that, how one would order trained parameters to be compatible with **(x, hx, cx, w, y, hy, cy)** argument list? Would it be something like w=[Wi Ri bWi bRi Wf Rf bWf bRf Wo Ro bWo bRo Wc Rc bWc bRc], hx = h(t-1), cx = c(t-1), y = o(t), hy = h(t), cy = c(t)?

Q2: How this ordering be different for **bidirectional LSTM** (i.e., cudnnDirectionMode_t=CUDNN_BIDIRECTIONAL) case?

Sorry for commenting a bit late but I just saw the post.

I did not fully understand the proposed solution for supporting the peepholes in LSTM and cuDNN.

It seems to me the only way for doing that is implementing your own “myRNNForwardInference” that does step by step all the LSTM computations maybe by cublas primitives, because using the cudnnRNNForwardInference would not be possible even with the “biases” trick.

In fact, in the LSTM formula (according to original paper “LSTM: A Search Space Odyssey” - https://arxiv.org/pdf/1503.04069.pdf) o(t) uses cell state c(t) and not c(t-1); c(t) is not available until the cudnnRNNForwardInference has finished, so it seems to me the “biases” trick cannot be used with cudnnRNNForwardInference API. Vice versa, cudnnRNNForwardInference API could be used if in every LSTM formula the used cell state was c(t-1) but unfortunately that is not the case.