I am building a semi supervised learning system with GA + LSTM. I need to build a multi-layer LSTM network with cudnn, which contains several LSTM layers and a softmax output layer. I found an example of how to build an LSTM layer with cudnn:
You can flatten the LSTM output to 2D version to match the input size of density layer.
If the original output is 3-D, [N, H, W], then the 2-D version would be [N, H*W]
Where N is the batch size, and K is the product of the other dimensions.
It depend on the output dimension of LSTM layer and dimension of your desired target/model.
If the output dim of your LSTM layer doesn’t match the output dim you’re predicting, a linear layer can be used in between LSTM & softmax layer to bridge the difference.
In case where the output of LSTM layer is 3D, and the input of density layer is 2D. You can flatten the LSTM output to 2D version to match the input of density layer.
If the output dim of my LSTM layer matchs the output dim I’m predicting, no linear layer can be used in between LSTM & softmax layer. In case, the input to softmax layer is 3-D,how to select cudnnSoftmaxMode_t? And is there anything else to deal with?