As a follow up to this thread - Inference Broken - Long Form Audio and gRPC max message sizes - #4 by shantanu1
We can and have tried using the streaming inference API, but that means we completely loose out on the benefits of Offline/Batch inference. Is there a workaround we could use? Send in 4MB chunks of audio? I’m not sure how that would affect, say, citrine’s attention mechanism.
Are we not expected to run long-form audio on Riva as a use case? I thought with attention-based models like citrinet that that’s where they would shine.
If such batch/offline processing is not supported (at least for files >4MB) or not a pipeline you intend on supporting then please let us know, as we’re looking for a platform to serve that purpose primarily, and Jarvis/Riva seemed like a good fit for exactly that.
We would appreciate some/any indication on what you guys have in mind for the future regarding this. Yes, for now, we can use the streaming inference, but that makes sense for us as a stand-in only if long-form transcripts are eventually possible. Streaming and offline aren’t interchangeable, for our purposes.
It also looks like Triton supports configuring larger message size  and most other inference/training platforms support exposing this gRPC config. Can we talk to Triton directly?