Hi !
We have a custom trained Nemo model. In particular a Conformer, with RNNT Char encoded decoder layers…
At risk of repeating myself, it’s a conformer model using Char encoding (instead of BPE) and using an RNNT/Transducer (instead of CTC). The model class is EncDecRNNTModel.
I’m trying to get this working in streaming aka buffered inference mode.
There are some excellent notebooks with explanations and example code of how to do streaming with Nemo this, here and here
(Yes I do realize that these notebooks are in the Nemo github, not on google per se).
I’m getting problems that might be because the examples have not been updated to latest versions? Or maybe it’s something else. Anyway would really appreciate any help.
The short version for the problem I’m having is that I get this error when I try to use it.
AttributeError: 'EncDecRNNTModel' object has no attribute 'tokenizer'
Specifically the LongestCommonSubsequenceBatchedFrameASRRNNT class (from nemo/collections/asr/parts/utils/streaming_utils.py) makes reference to the model.tokenizer object.
It does that on this line 715
if hasattr(asr_model.decoder, "vocabulary"):
self.blank_id = len(asr_model.decoder.vocabulary)
else:
self.blank_id = len(asr_model.joint.vocabulary)
self.tokenizer = asr_model.tokenizer # <-- here
The problem is that the asr_model I’m using aka the EncDecRNNTModel from Nemo 1.20 doesn’t have a tokenizer. Methods like decode_ids_to_tokens are on the model.decoding object.
Maybe this stuff got moved around and I just need to make some small changes to the streaming code or maybe I’m very very confused about the whole thing.
Any help very much appreciated! Thanks in advance.
BTW I’m using …
>>> nemo.__version__
'1.20.0'