How to build the output layer for a forward LSTM network?

I’m building a forward LSTM network, but I don’t know how to build the output layer. I connected a density layer to the LSTM layer, but it didn’t work. What’s wrong? The output layer as follow:

int adjBatchSize=batchSize*seqLenght;
int inputSize=outputSizeOfLSTM;
 checkCublasErrors(cublasSgemm(culasHandle_,
                                      CUBLAS_OP_N, CUBLAS_OP_T,
                                      adjBatchSize, outputSize, inputSize,
                                      &FLOAT_ONE,
                                      dInput, adjBatchSize,
                                      dWeights, inputSize,
                                      &FLOAT_ZERO,
                                      dNetOutput, adjBatchSize));
        cudaDeviceSynchronize();

dInput is the output of LSTM with the shape [seqLenght,batchSize,outputSizeOfLSTM] (CUDNN_RNN_DATA_LAYOUT_SEQ_MAJOR_PACKED); The shape of dNetOutput is [seqLenght,batchSize,outputSize]?

Could you please share with us output error logs for better debugging.