Hello everyone. I would like to deploy the model to tensorflow, so that it returns the output of the model in a stream. How can you do this using tf serving:
1.cut the network into an encoder and a decoder
2. run the input sequence through the encoder
3.send to the decoder rnn the state of the rnn, as well as the states of the encoder, then get one output
4.repeat the third point until you receive the eos-token
Now the question itself, is it possible by means of triton to avoid re-sending the state of the encoder and the state of the rnn at each step? If I understood correctly, there is a Sequence Batcher for this, but to be honest, looking at the examples in the documentation, I don’t understand at all how to use it in my case.