Streaming Inference fails intermittently with error: must specify the START flag on the first request of the sequence

pineapple9011 · November 5, 2021, 4:05pm

Hardware - GPU T4
Operating System: Docker (Riva Server image)
Riva Version: 1.6.0

I have a RIVA ASR model deployed and successfully performing streaming ASR using gRPC with Node.JS.
However, I sometimes get the following error intermittently:

E1105 15:57:29.546823  7732 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 1714898170 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence

Before streaming audio, I make sure the server is ready and send the Streaming Config request before sending any other audio.
Is it possible this is a server side error caused by some sequencing?

Below is a longer stack trace of the logs before and after the error:

I1105 15:56:23.809695 10552 grpc_riva_asr.cc:870] Using model Cn-SpeUni256-EaTl380-mn2 for inference
I1105 15:56:23.809767 10552 grpc_riva_asr.cc:886] Model sample rate= 16000 for inference
I1105 15:56:24.383030 10552 riva_asr_stream.cc:219] Detected format: encoding = 1 RAW numchannels = 1 samplerate = 16000 bitspersample = 16
I1105 15:57:29.545948  7733 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545979 10628 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.546018  7613 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545966 10217 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545991 10556 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.546125 10450 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.545979 10582 grpc_riva_asr.cc:590] Send silence buffer for EOS
I1105 15:57:29.546255  8730 grpc_riva_asr.cc:590] Send silence buffer for EOS
E1105 15:57:29.546823  7732 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 1714898170 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.546990  7611 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 229575147 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547165 10216 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 2009964687 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547258 10449 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 2086559401 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547461 10581 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 554894007 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
E1105 15:57:29.547472  8728 libriva_asr.cc:141] The inference failed: in ensemble 'Cn-SpeUni256-EaTl380-mn2', inference request for sequence 149722295 to model 'Cn-SpeUni256-EaTl380-mn2-feature-extractor-streaming' must specify the START flag on the first request of the sequence
I1105 15:57:29.564607 10600 grpc_riva_asr.cc:975] ASRService.StreamingRecognize returning OK
I1105 15:57:29.564905 10323 grpc_riva_asr.cc:975] ASRService.StreamingRecognize returning OK
I1105 15:58:09.361279 10921 grpc_riva_asr.cc:799] ASRService.StreamingRecognize called.
I1105 15:58:09.361320 10921 grpc_riva_asr.cc:833] ASRService.StreamingRecognize performing streaming recognition with sequence id: 720730861

AakankshaS · November 12, 2021, 4:52am

Hi @pineapple9011 ,
Apologies for the delay, pls allow us sometime to check on this.

Thanks!

rleary · November 12, 2021, 3:08pm

Hi @pineapple9011, can you share any details on how we can replicate this? How frequent is ‘intermittent’? Once you see this error, do other inferences continue to work as expected or does the server stop responding properly for all requests?

Sorry for the trouble here, and thanks for your assistance with helping us debug.

pineapple9011 · December 6, 2021, 3:45pm

@rleary Sorry for the delay here! I never got a notification about this.
Here is the config I used for building the model:

riva-build speech_recognition \
  /servicemaker-dev/$RIVA_VERSION/$OUTPUT_MODEL_NAME.rmir \
  /servicemaker-dev/$RIVA_VERSION/$OUTPUT_MODEL_NAME.riva \
   --force \
  --name="$OUTPUT_MODEL_NAME" \
  --language_code="ar-BH" \
  --streaming \
  --decoder_type=greedy \
  --chunk_size=1.2 \
  --padding_size=2.4 \
  --ms_per_timestep=80 \
  --greedy_decoder.asr_model_delay=-1 \
  --featurizer.use_utterance_norm_params=False \
  --featurizer.precalc_norm_time_steps=0 \
  --featurizer.precalc_norm_params=False \
  --nn.instance_group_count=2 \
  --vad.vad_start_history=300 \
  --vad.vad_start_th=0.2 \
  --vad.vad_stop_history=1200 \
  --vad.vad_stop_th=0.98

And attached are some Prometheus metrics from Triton over the last 7 days.
I was hoping to show metrics from early November since they showed much less errors, but more intermittently. However it seems they are too old to be displayed.

Now that I look at this, it seems that this could be an issue with autoscaling and GPU resources?
If so, it would be good to potentially add to the “Performance” section of the docs suggested GPU allocation based on number of requests/streams for the models.

Requests per minute
You can see the succeeded requests in yellow and the failed requests in blue

Failed requests percentage
Percentage of failed requests w.r.t all requests

GPU Utilization

pineapple9011 · December 19, 2021, 9:48pm

We resolved this by making sure that the gRPC streaming channels are properly terminated and closed on the client side.

It seems this is not a server side (Triton) issue.

pineapple9011 · April 12, 2022, 7:31am

@rleary Unfortunately this issue came up again in the latest version of Riva (2.0.0).

Any thoughts on why this could be happening again?

pineapple9011 · April 20, 2022, 7:41am

FYI, the new error we see on the client is:

Error: 2 UNKNOWN: in ensemble 'CnLgGm025-streaming', inference request for sequence 1817244930 to model 'CnLgGm025-streaming-feature-extractor-streaming' must specify the START flag on the first request of the sequence

mehadi.hasan · July 28, 2024, 11:30am

Hi,
@AakankshaS @pineapple9011

I’m facing the same issue. How do you solve this?

I’m using Riva 2.12.x

Thanks

Topic		Replies	Views
Riva Streaming Recognize Returning Failure Riva	2	705	April 10, 2023
Final transcript is empty on streaming mode Riva	5	781	December 22, 2022
Nvidia RIVA - 2.6.0 gettting stuck after some time. Giving timeout error after sometime of inferencing Riva	5	805	December 19, 2022
Failed to get riva started Riva riva	7	1852	December 3, 2022
Sporadic streaming gRPC error "2 UNKNOWN: TRTIS response timeout" Riva	6	1496	January 25, 2022
NVIDIA RIVA ASR going down frequently for en-US Riva	2	547	October 21, 2024
Nvidia Riva health check fail Riva riva	1	534	February 14, 2025
TTS Synthesize Online randomly fails with a Streaming timed out Riva inference-server-triton , riva	1	596	April 5, 2024
[riva tts] Error: Triton model failed during inference. Error message: Stream has been closed Riva	1	760	November 24, 2022
Riva Intermittent errors on Jetpack 6.0 Jetson AGX Orin	4	417	April 24, 2024

Streaming Inference fails intermittently with error: must specify the START flag on the first request of the sequence

Related topics