[Riva] - Is it possible to cache inferences for TTS

Hardware - GPU (T4)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)

Currently we are trying to use Riva to infer Phrases to produce audio files. Is it possible to cache the response/result of same phrase at the Triton server level? If not are there any recommended way to do this type of caching.

Caching is fairly application specific, would probably be different solution if you are mobile vs call center vs web. Riva team does not have a best practice for this currently. Can you outline your use case in more detail?


Our use case is mobile application accessing RIVA ( or Triton) server for TTS.
The question from arnab is to understand if

A specific phrase/text has already been synthesized, is there recommended approach to caching this synthesized output if the phrase is exactly the same ?

“Hello, how are you ?” - Requested by 1st time results in synthesis
Requested subsequently results in the cached response

Is this supported or recommended approach ?