Looking for documentation on T4 video encoding/decoding limitations

I’ve got an application that does the following:

  • takes in an h.264 stream in real time (1080p, 60fps, 6000kbps)
  • decodes the stream on the GPU
  • does some compositing of the image on the GPU (using OpenGL)
  • re-encodes the stream on the GPU (1080p, 60fps, 6000kbps)
  • delivers the stream elsewhere

My question is: how many concurrent streams can I expect to get out of a single T4 card? Keep in mind that each “stream” will be decoding as well as encoding.

I’ve seen numbers where in low-latency mode (which is fine) it should be able to decode ~17 1080p streams… but I’m having trouble finding documentation on concurrent encoders.