Nvidia jetson-voice demo container permanent learning

Hi Everyone

I am currently testing the Xavier NX for questions and answers using the jetson-voice container:

https://ngc.nvidia.com/catalog/containers/nvidia:jetson-voice

But it takes a good amount of time to return an answer if the given text is long.

I know the question might seem naive but as a trial; anyone can briefly give me the possible roadmap to allow the jetson-voice demo to learn something permanently.

i.e. currently what I feel the demo does is that it analyzes the given paragraphs each time it is given a new question to search for the suitable answer. If the given text is huge for example, it takes a long time to analyze the whole text and returns an answer.

What I hope is to amend the behavior so that it makes the analysis one time and then reply to the questions faster, maybe, by storing the analysis result to like a library of Q&As categorized per subject to enhance the reply time,

Any thoughts on this would be appreciated.

Regards,

Hi @amehrez,

Good question - the question/answering models that I’ve seen all use the context paragraph to pull the answers from. What they are doing is more akin to similarity ranking than actually “learning” the source material. When the QA models are trained, they are typically trained on many different contexts (for example, from the Stanford SQuAD dataset).

I’m personally unware of a DNN that performs information retrieval on a knowledge base that is baked-into the model weights (without supplying context paragraph), but perhaps the community can share their experiences, or it is a part of the research.

Something I have been looking into recently is using light-weight BERT models like DistilBERT and MobileBERT which have similar accuracy/F1-score to BERT-Base, but have much faster performance. NeMo supports training of DistilBERT and Huggingface has a bunch of these models in their zoo.

Thanks, @dusty_nv for your thoughtful reply.

I wanted to try NeMo but unfortunately, still not supporting ARM processors.

Hope someone in the community has hands-on on this can give us more insights.

Regards,

You can train the NeMo notebooks on Google Colab and export them to ONNX. I am making some TensorRT wrappers for running on Jetson.

I did find that NeMo supports Information Retrieval model, but like QA it also appears to have passage/context text included in the input tokens.