Speech_to_text_citrinet infer yields random transcription results

Could you please refer to Tao speech_to_text evaluate+infer show very weak results - #26 by Morganh and run some experiments?
In that topic, I was running with speech_to_text. The result is fine.

For your case, when run speech-to-text-citrinet, you can use Speech to Text English Citrinet | NVIDIA NGC