Hello,

I will first summarize what I *think* I understood about cuDNN 5.1 rnn functions:

```
x = [seq_length, batch_size, vocab_size] # input
y = [seq_length, batch_size, hiddenSize] # output
dx = [seq_length, batch_size, vocab_size] # input gradient
dy = [seq_length, batch_size, hiddenSize] # output gradient
hx = [num_layer, batch_size, hiddenSize] # input hidden state
hy = [num_layer, batch_size, hiddenSize] # output hidden state
cx = [num_layer, batch_size, hiddenSize] # input cell state
cy = [num_layer, batch_size, hiddenSize] # output cell state
dhx = [num_layer, batch_size, hiddenSize] # input hidden state gradient
dhy = [num_layer, batch_size, hiddenSize] # output hidden state gradient
dcx = [num_layer, batch_size, hiddenSize] # input cell state gradient
dcy = [num_layer, batch_size, hiddenSize] # output cell state gradient
w = [param size] # parameters (weights & bias)
dw = [param size] # parameters gradients
```

cudnnRNNForwardTraining/cudnnRNNForwardInference

```
input: x, hx, cx, w
output: y, hy, cy
```

cudnnRNNBackwardData

```
input: y, dy, dhy, dcy, w, hx, cx
output: dx, dhx, dcx
```

cudnnRNNBackwardWeights

```
input: x, hx, y, dw
output: dw
```

**Questions:**

- Is the following training workflow for multi-layer RNN (
*num_layer*> 1) correct?

```
init hx,cx,dhy,dcy to NULL
init w: (weights:small random values, bias: 1)
forward
backward data
backward weights
update weights: w += dw
dw = 0
goto 4
```

- Do you confirm cuDNN already implements stacked rnn when
*num_layer*> 1? (no need to call*num_layer*times forward/backward methods) - Should I re-inject hidden state & cell state into the network at next batch?
- The output in lstm formulas is
*hy*. Should I use*hy*as output or*y*?

I am experimenting on a toy data set (x = a sentence repeated a few times, trying to predict next letter), so far the loss never converges.

network: input->lstm->fully connected->softmax

batchSize = 1

sequenceLength = 3

hiddenSize = 20

numLayers = 2

vocabSize/inputSize = 255