Streaming Inference fails intermittently with error: must specify the START flag on the first request of the sequence

Hardware - GPU T4
Operating System: Docker (Riva Server image)
Riva Version: 1.6.0

I have a RIVA ASR model deployed and successfully performing streaming ASR using gRPC with Node.JS.
However, I sometimes get the following error intermittently:

E1105 15:57:29.546823  7732 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 1714898170 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence

Before streaming audio, I make sure the server is ready and send the Streaming Config request before sending any other audio.
Is it possible this is a server side error caused by some sequencing?


Below is a longer stack trace of the logs before and after the error:

I1105 15:56:23.809695 10552 grpc_riva_asr.cc:870] Using model Cn-SpeUni256-EaTl380-mn2 for inference
I1105 15:56:23.809767 10552 grpc_riva_asr.cc:886] Model sample rate= 16000 for inference
I1105 15:56:24.383030 10552 riva_asr_stream.cc:219] Detected format: encoding = 1 RAW numchannels = 1 samplerate = 16000 bitspersample = 16
I1105 15:57:29.545948  7733 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545979 10628 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.546018  7613 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545966 10217 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545991 10556 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.546125 10450 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545979 10582 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.546255  8730 grpc_riva_asr.cc:590] Send silence buffer for EOS
E1105 15:57:29.546823  7732 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 1714898170 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.546990  7611 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 229575147 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547165 10216 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 2009964687 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547258 10449 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 2086559401 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547461 10581 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 554894007 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547472  8728 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 149722295 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
I1105 15:57:29.564607 10600 grpc_riva_asr.cc:975] ASRService.StreamingRecognize returning OK
I1105 15:57:29.564905 10323 grpc_riva_asr.cc:975] ASRService.StreamingRecognize returning OK
I1105 15:58:09.361279 10921 grpc_riva_asr.cc:799] ASRService.StreamingRecognize called.
I1105 15:58:09.361320 10921 grpc_riva_asr.cc:833] ASRService.StreamingRecognize performing streaming recognition with sequence id: 720730861

Hi @pineapple9011 ,
Apologies for the delay, pls allow us sometime to check on this.

Thanks!

Hi @pineapple9011, can you share any details on how we can replicate this? How frequent is ‘intermittent’? Once you see this error, do other inferences continue to work as expected or does the server stop responding properly for all requests?

Sorry for the trouble here, and thanks for your assistance with helping us debug.

@rleary Sorry for the delay here! I never got a notification about this.
Here is the config I used for building the model:

riva-build speech_recognition \
  /servicemaker-dev/$RIVA_VERSION/$OUTPUT_MODEL_NAME.rmir \
  /servicemaker-dev/$RIVA_VERSION/$OUTPUT_MODEL_NAME.riva \
   --force \
  --name="$OUTPUT_MODEL_NAME" \
  --language_code="ar-BH" \
  --streaming \
  --decoder_type=greedy \
  --chunk_size=1.2 \
  --padding_size=2.4 \
  --ms_per_timestep=80 \
  --greedy_decoder.asr_model_delay=-1 \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --nn.instance_group_count=2 \
  --vad.vad_start_history=300 \
  --vad.vad_start_th=0.2 \
  --vad.vad_stop_history=1200 \
  --vad.vad_stop_th=0.98

And attached are some Prometheus metrics from Triton over the last 7 days.
I was hoping to show metrics from early November since they showed much less errors, but more intermittently. However it seems they are too old to be displayed.

Now that I look at this, it seems that this could be an issue with autoscaling and GPU resources?
If so, it would be good to potentially add to the “Performance” section of the docs suggested GPU allocation based on number of requests/streams for the models.

Requests per minute
You can see the succeeded requests in yellow and the failed requests in blue

Failed requests percentage
Percentage of failed requests w.r.t all requests

GPU Utilization

We resolved this by making sure that the gRPC streaming channels are properly terminated and closed on the client side.

It seems this is not a server side (Triton) issue.

@rleary Unfortunately this issue came up again in the latest version of Riva (2.0.0).

Any thoughts on why this could be happening again?

FYI, the new error we see on the client is:

Error: 2 UNKNOWN: in ensemble 'CnLgGm025-streaming', inference request for sequence 1817244930 to model 'CnLgGm025-streaming-feature-extractor-streaming' must specify the START flag on the first request of the sequence