Inference Broken - Long Form Audio and gRPC max message sizes

Hardware -
GPU (T4 on an AWS EC2 instance) 16GB
CPU - Intel® Xeon® Scalable (Cascade Lake) 4 x vCPU
Operating System:
Ubuntu 18 LTS with Nvidia DL AMI
Riva Version:
v1.5.0-beta
How to reproduce the issue ?

  1. Have riva set up and running with citrinet models offline+streaming models in config.sh - in my case via the Quickstart and on an EC2 instance
  2. Run the Riva client on the same instance by running bash riva_start_client.sh.
  3. Copy a larger than 4MiB audio file into the riva-client container and try to process it using the streaming model. This fails with a gRPC error stating that the message received is larger than the max message size. Here’s some output:
>riva_asr_client --max_alternatives=0 --audio-file=./wav/working_files/longform_test.wav --print_transcripts=true
Loading eval dataset...
filename: /work/wav/working_files/longform_test.wav

Done loading 1 files
RPC failed: Received message larger than max (150952067 vs. 4194304)
Done processing 1 responses
Some requests failed to complete properly, not printing performance stats

150952067 bytes = 144MiB
4194304 bytes = 4Mib

A few considerations, and things that I’ve tried — it seems like you can pass in an options object when setting up the gRPC channel client-side and the server can decide if it wants to respect that message size setting. I tried this by adding it in riva_quickstart/examples/transcribe_file_offline.py to the --server argument.

parser.add_argument("--server", default="localhost:50051", options=[("grpc.max_receive_message_length", 1024*1024*1024 ), ("grpc.max_send_message_length", 1024*1024*1024 )], type=str, help="URI to GRPC server endpoint")

However the server responds with a gRPC error Error received from peer ipv6:[::1]:50051 and the following stack trace is printed:

>python3 transcribe_file_offline.py --audio-file=/work/wav/working_files/longform_test.wav 
Traceback (most recent call last):
  File "transcribe_file_offline.py", line 62, in <module>
    response = client.Recognize(request)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "Received message larger than max (150952038 vs. 4194304)"
	debug_error_string = "{"created":"@1631874680.490127888","description":"Error received from peer ipv6:[::1]:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Received message larger than max (150952038 vs. 4194304)","grpc_status":8}"

IMO this is breaks inference on long-form audio. Can we expect this to be fixed/changed?

Hi @ShantanuNair ,
We don’t recommend running much larger files than that currently supported. We might support configuration option in the future.
Best option, for now, is to simply use the streaming API instead.
Can you please try that instead?
Thanks!

Hi @AakanshaS
Thanks for all your work! That’s really disappointing hear though :( We can and have tried using the streaming inference API, but that means we completely loose out on the benefits of Offline/Batch inference. Is there a workaround we could use? Are we not expected to run long-form audio? I thought with attention-based models like citrinet that that’s where they would shine.

If such batch/offline processing is not supported (at least for files >4MB) or not a pipeline you intend on supporting then please let us know, as we’re looking for a platform to serve that purpose primarily, and Jarvis/Riva seemed like a good fit. We would appreciate some indication on what you guys have in mind for the future regarding this. Yes, for now, we can use the streaming inference, but that makes sense for us as a stand-in only if long-form transcripts are eventually possible.

It also looks like Triton supports configuring this [1] and most other inference/training platforms support exposing this gRPC config[2]. Can we talk to Triton directly?

  1. Increase the GRPC client maximum message size. On infer request error… · triton-inference-server/server@e9b541b · GitHub
  2. https://docs.seldon.io/projects/seldon-core/en/v0.4.17/examples/max_grpc_msg_size.html#Allowing-larger-gRPC-messages

Hi @AakankshaS - Any comments?

Hi @ShantanuNair ,

Running with larger files is however not suggested because the way protobufs work is that you need to parse the whole message object before any field can be handled … so it’s just a fair amount of overhead
can you please help us on what size would be sufficient for you?

Thanks!

@AakankshaS
~200MB to 1GB or at least as large as possible. Other production STT and inference platforms do the same as well - they either have a much larger size (1GB) or allow configuring this setting (the client can set it typically).

Right now, a 16khz/16 bit-depth/2channel audio file only about 1 minute long just barely fits under your batch processing size limits, which means dealing with audio even a few minutes long, or with more channels, or at higher sample rate would force us to use streaming inference.

I will add supporting material for other platforms and gRPC Servers that behave the same as what we expect.

Thanks for your reply! We are part of the Inception program. Is there any way we could have this back and forth be more convenient for the both of us? If there is an alternate/more convenient channel to discuss this with the Riva team, please, let me know.

Thanks for the additional details about your use case.

Adding true offline support is something we’re considering doing in the future, and we would potentially explore some kind of asynchronous API for that (where you pass in a pointer/file location/s3/etc etc) to audio you want transcribed. (FWIW Citrinet is not attention-based and currently the offline/batch API uses streaming inference, albeit with significantly larger chunk sizes to improve efficiency).

To help with your particular issue, we can increase this size limit starting in the 21.10 release.

1 Like

@rleary

Truly appreciate it! I understand looks like I was confusing citrinet with the conformer model, since I was evaluating how they compare with different contexts available wrt Offline inference. Really looking forward to the 12.10 release! Thanks a bunch for your and @AakanshaS time

Not being limited by a protocol’s message size would be extremely helpful, and yes I do think passing in a pointer to the file such as an S3 uri or file location would be helpful in addition, but does not give us the flexibility and power of larger gRPC message sizes.

Pytorch’s TorchServe made the change - Raise the max message size of gRPC for tensorflow model server · Issue #288 · tensorflow/serving · GitHub
as did TensorFlow Serving - gRPC: Received message larger than max (32243844 vs. 4194304)" · Issue #1382 · tensorflow/serving · GitHub where they had a 2GB default which is overrided by options provided when creating the client stub.
Similarly, the triton team also allows this - Increase the max allowed size for grpc messages in python client by tanmayv25 · Pull Request #1799 · triton-inference-server/server · GitHub

Hi @ShantanuNair,

Just wanted to follow up here. The change to increase gRPC message size has been merged. You can expect it in the Riva 21.10 release.

Best,
Ryan

1 Like

@rleary
Thank you Ryan! We really appreciate you getting back to us like this, as for a brief moment in time we were reconsidering building on Riva when we didn’t hear back. Having this line of communication, and some idea of Riva’s roadmap further secures our ability to make decisions involving Riva and our tech.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.