structure of BiRNNs is different from pytorch

for tensorrt,,
for pytorch,,
the concatenated output for each layer is different. Why ?


Could you please elaborate more about the issue?
Also, is there any performance/accuracy problem between two implementation?

Thanks, as shown in the BiRNNs diagram, h0,0 is concatenated by h0f,0 and h0b,0, but in pytorch, h0,0 is concatenated by h1f,0 and hTb,0

I am using addRNNv2 to create a 3-layers BLSTM model, outputs of the two implements is not equal, the input data, weights, biases and init states are same.


Could you please share the script to reproduce the issue?

Also, can you provide the following information so we can better help?
Provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version


The first picture comes from tensorrt documents. The second picture is the general structure of BiRNN. But they are different. In the general structure, the output yi(0 <= i < T) is defined by the concatenation of the forward hidden state hif and the backward hidden state h(T-i)b, but in tensorrt, the output yi is defined by the concatenation of the forward hidden state hif and the backward hidden state hib.

void LSTM::addToModel(
    nvinfer1::INetworkDefinition* network,
    nvinfer1::ITensor* inputData,
    nvinfer1::ITensor* sequenceLength,
    nvinfer1::ITensor* hiddenState,
    nvinfer1::ITensor* cellState,
    nvinfer1::ITensor** outputState)
    int maxSeqLen = inputData->getDimensions().d[0];
    auto rnn = network->addRNNv2(
        mNumLayers, // layers count
        maxSeqLen, // max sequence length
    assert(rnn != nullptr);


    std::vector<nvinfer1::RNNGateType> gateOrder({nvinfer1::RNNGateType::kINPUT,
    for (size_t i = 0; i < mGateKernelWeights.size(); i++)
        bool isW = ((i%8) < 4); 
        rnn->setWeightsForGate(i/8, gateOrder[i % 4], isW, mGateKernelWeights[i]);
        rnn->setBiasForGate(i/8, gateOrder[i % 4], isW, mGateBiasWeights[i]);


    *outputState = rnn->getOutput(0);

This issue has nothing to do with the platform.

Hi SunilJB:

The TensorRT BiRNN code I posted works ok now, output is same as pytorch.

So it indicates that the diagram in is wrong, I have been misleaded by this diagram.