Ford Pass using weights of tf.compat.v1.keras.layers.CuDNNLSTM

o Ubuntu 18.04.2 LTS
o GPU: GeForce GTX 980
o Nvidia driver version:410.78
o CUDA version:10
o Python version:3.7.3
o Tensorflow version: 1.14.0

Hi

I’m currently working on voice recognition using LSTM with keras in tensorflow.
I’m using the function tf.compat.v1.keras.layers.CuDNNLSTM, and I’m having problems trying
to replicate the output of the model once it’s trained using the weights
that tensorflow can give me. I’m assuming that the model of LSTM that
cuDNN implements is the following:
https://wikimedia.org/api/rest_v1/media/math/render/svg/2db2cba6a0d878e13932fa27ce6f3fb71ad99cf1
where ∘ denotes the Hadamard product
(element-wise product).

First of all, I would like to know if σ_g is the sigmoid function
1/(1-exp(-x)), if σ_c is the hyperbolic tangent and if σ_h is also the
hyperbolic tangent.

Then, I would like to know if it’s okay to assume that the order of the
weights in the kernel and recurrent kernel arrays is i,f,c,o. If not,
could you please tell me what’s the correct order?.
Also, I’m assuming that the order of the eight biases is wi, wf, wc,
wo, ui, uf, uc, uo.

Where b_i = b_wi + b_ui, and so on. If this is wrong, could you please
tell what’s the correct order?

Any other advice is welcome.

Thank you so much in anticipation

Best regards

Nicolas Grageda U.

nicolas.grageda.u@gmail.com

Hi,

Please refer below link for more info:
https://developer.nvidia.com/discover/lstm
https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/
https://github.com/NVIDIA-developer-blog/code-samples/tree/master/posts/rnn

Thanks

Hi,

I’ve already seen the links you gave me, but none of those answer my questions. Could you be more specific and give me more details in your answer please?

Thank you

The model of LSTM I’m trying to replicate is this one:

X = tf.placeholder(tf.float32, [None, None, n_coeffs_input])
Y = tf.placeholder(tf.float32, [None, None, n_coeffs_output])
l_r = tf.placeholder(tf.float32, [])
utts_len = tf.placeholder(tf.float32, [batch_size])

lstm1 = tf.compat.v1.keras.layers.CuDNNLSTM(nodos, return_sequences=True,return_state=True)

output_lstm1 = lstm1(inputs=X)
#output_lstm1 is a list of len 3, where the first element is the output of the lstm for all times
#the second element is the last output, and the third one is the last state.

mse = tf.reduce_mean(tf.squared_difference(Y, output_lstm1[0]))
...

The function I defined to replicate the LSTM use the ifco order:

def NewLSTM(kernel, recurrent_kernel, Bias,x,previous_output,previous_state,units=1):
    k_gates=np.dot(np.array([x]),kernel)
    kr_gates=np.dot(previous_output,np.array([recurrent_kernel]))
    gates=k_gates+kr_gates+Bias[:(units*4)]+Bias[(units*4):]
    i=sigmoide_np(gates[:,:units])
    f=sigmoide_np(gates[:,units:(units*2)])
    c=np.tanh(gates[:,(units*2):(units*3)])
    o=sigmoide_np(gates[:,(units*3):])
    state= np.multiply(i,c)+np.multiply(previous_state,f)
    out=o*np.tanh(state)
    return state, out

For a LSTM with n_coef

total_out=net.run((output_lstm1), feed_dict={X: batch_in, utts_len: [10]})
last_out=total_out[1]
last_state=total_out[2]
total_out=np.squeeze(total_out[0])
replicated_LSTM=[]

x_in=np.squeeze(batch_in)

#For the first in we assume 0 last state and 0 last output
h=np.zeros((1,n_coeffs_output))
c=np.zeros((1,n_coeffs_output))


state, outI_t_1=NuevaLSTM(lstm1.get_weights()[0], lstm1.get_weights()[1], 
                lstm1.get_weights()[2],
                x_in[0],h[0,:],c[0,:])
replicated_LSTM.append(outI_t_1)



for last in range(-9,0):

    outI=net.run((output_lstm1), feed_dict={X: np.reshape(batch_in[0,:last,:],(1,10+last,n_coeffs_input)), utts_len: ([10+last])})
    last_out=outI[1]
    last_state=outI[2]   

    state, outI_t_1=NewLSTM(lstm1.get_weights()[0], lstm1.get_weights()[1], 
                lstm1.get_weights()[2],
                x_in[last],last_out[0],last_state[0],1)
    replicated_LSTM.append(outI_t_1)

print("output with net.run:")
for i in total_out:
    print(i)
print("Output with NewLSTM:")
for i in replicated_LSTM:
print(i)

The output for n_coeffs_input=1 and n_coeffs_output=1 is:

output with net.run:
0.6079306
0.88047564
0.9457015
0.9593791
0.9630412
0.96477175
0.9660919
0.96729606
0.96844375
0.96954805
Output with NewLSTM:
[[0.60793061]]
[[0.88047569]]
[[0.94570145]]
[[0.95937899]]
[[0.96304125]]
[[0.96477178]]
[[0.96609195]]
[[0.96729607]]
[[0.96844372]]
[[0.96954812]]

As you can see, the results are very similar, so the we could say that is a good replication, but when I use n_coeffs_output=2 I got very different results:

output with net.run:
[[-1.6811120e-05 -7.6083797e-01]
 [-5.3793984e-04 -9.5489222e-01]
 [-4.9301507e-03 -9.7488159e-01]
 [-2.7245175e-02 -9.5495248e-01]
 [-1.0828623e-01 -9.0503311e-01]
 [-2.8274482e-01 -8.2405442e-01]
 [-4.7703674e-01 -7.3043454e-01]
 [-6.2741107e-01 -6.1664152e-01]
 [-7.3953187e-01 -4.4329706e-01]
 [-8.0043334e-01 -1.7720392e-01]]
Output with NewLSTM:
[[-1.68111220e-05 -7.60837996e-01]]
[[-4.95224298e-04 -9.54892126e-01]]
[[-0.00478779 -0.97487646]]
[[-0.02719039 -0.95483796]]
[[-0.10874437 -0.90371213]]
[[-0.29537781 -0.81501876]]
[[-0.52451342 -0.69765331]]
[[-0.69437694 -0.54462647]]
[[-0.79712232 -0.31257828]]
[[-0.85020044 -0.08377024]]

The Equations are documented here : https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html under CUDNN_LSTM mode.
Hope it helps.

Thanks