some questions about lstm optimization

I read lstm developer blog in the link,

and also read the code in the github, which is in link

I have some questions,
1, I don’t understand the benefit for Pre-Transposing the Weight Matrix.
2, I think the LSTM code in github just has one assumption, that input_dim = hidden_size, is my understanding correct?