Hardware - GPU (T4)
Hardware - CPU
Operating System
Riva Version 2.9.0
TLT Version (if relevant)
I modified the variable interim_results as false in the script: python-clients/scripts/asr/riva_streaming_asr_client.py. and placed a audio file named nlp.wav under the folder data/examples, whose duration is one hour, and then run the command: cd python-clients; python scripts/asr/riva_streaming_asr_client.py --input-file data/examples/nlp.wav, and find the last timestamp printed is 112.24s, but it should be a value around 3600s, right? What’s the meaning of Time printed here?
The following is the last few lines printed by the script:
Time 111.34s: Transcript 0: sub sample of some words where it works incredibly well it’s also true that when you really play around with it for a while you’ll find some things where like oh liked minus german goes to some crazy sushi term or something it doesn’t always make sense but there are a lot of them where it really is surprisingly Intuitive and so people essentially then came up with
Time 111.40s: Transcript 0: a dataset to try to see how often does it really appear
Time 111.43s: Transcript 0: and does it really
Time 111.63s: Transcript 0: Work this well and so they basically collected this word vector analogies task and these are some examples you can download all of them on this link here this is the again the original word paper that discovered
Time 111.82s: Transcript 0: and described these linear relationships and they basically look at chicago and Illinois and houston Texas and you can basically come up with a lot of different analogies where you know the city appears in that state
Time 112.13s: Transcript 0: of course there are some problems and you know as you optimize this metric more and more you will observe like oh well maybe that city name actually appears you know multiple different cities and different states have the same name and then it kind of depends on your corpus that you’re training on on whether this is being captured or not but still a lot of people
Time 112.24s: Transcript 0: it makes a lot of sense for most of them to optimize this at least for a little bit here are some other examples of