[Riva] - Is it possible to cache inferences for TTS

Please provide the following information when requesting support.

Hardware - GPU (T4)
Hardware - CPU
Operating System
Riva Version
TLT Version (if relevant)

Currently we are trying to use Riva to infer Phrases to produce audio files. Is it possible to cache the response/result of same phrase at the Triton server level? If not are there any recommended way to do this type of caching.

1 Like

Caching is fairly application specific, would probably be different solution if you are mobile vs call center vs web. Riva team does not have a best practice for this currently. Can you outline your use case in more detail?


Our use case is mobile application accessing RIVA ( or Triton) server for TTS.
The question from arnab is to understand if

A specific phrase/text has already been synthesized, is there recommended approach to caching this synthesized output if the phrase is exactly the same ?

“Hello, how are you ?” - Requested by 1st time results in synthesis
Requested subsequently results in the cached response

Is this supported or recommended approach ?