https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html
# Second bias vector included for CuDNN compatibility. Only one
# bias vector is needed in standard definition.
The parameter is mathematically completely redundant, like having two biases for dense layers. Why does cuDNN use it?