Attempt to transcribe audio file fails (detected audio length is 0)

I am attempting to get Riva up and running for my .NET project. The test application I wrote uses gRPC to interface with the (locally run) riva docker container. But when I try to transcribe the en-US_sample.wav audio file provided in the examples, the riva service processes the file and seemingly detects that the audio length is 0. But when I run the example transcribe_file_offline.py with the same file from the riva client docker container, it can process it correctly and detects the correct audio length (4.8 seconds).

I have already tried comparing the bytes that both the python program and my .NET application parse from the test audio file, and they are identical. I am also confident that the config I pass in the RecognizeRequest in my .NET application is identical, this can also be seen in the logs I provide below. Any idea on what might be going on here?

Request from my .NET application:

2024-01-29 02:37:29 I0129 01:37:29.469130   329 grpc_riva_asr.cc:678] ASRService.Recognize called.
2024-01-29 02:37:29 I0129 01:37:29.469235   329 riva_asr_stream.cc:226] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
2024-01-29 02:37:29 I0129 01:37:29.469381   329 grpc_riva_asr.cc:1025] Using model conformer-en-US-asr-offline from Triton localhost:8001
2024-01-29 02:37:30 I0129 01:37:30.006122   329 grpc_riva_asr.cc:1095] ASRService.Recognize returning OK
2024-01-29 02:37:30 I0129 01:37:30.006480   329 stats_builder.h:100] {"specversion":"1.0","type":"riva.asr.recognize.v1","source":"","subject":"","id":"6c6924ed-3b01-4b4d-85fb-29501775d1ad","datacontenttype":"application/json","time":"2024-01-29T01:37:29.469025175+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"audio_duration":0.0,"speech_duration":0.0,"status":0,"err_msg":""}}

Request from the example python program:

2024-01-29 02:38:03 I0129 01:38:03.283649   307 grpc_riva_asr.cc:678] ASRService.Recognize called.
2024-01-29 02:38:03 I0129 01:38:03.283711   307 riva_asr_stream.cc:226] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
2024-01-29 02:38:03 I0129 01:38:03.283916   307 grpc_riva_asr.cc:1025] Using model conformer-en-US-asr-offline from Triton localhost:8001
2024-01-29 02:38:03 I0129 01:38:03.308118   307 grpc_riva_asr.cc:1095] ASRService.Recognize returning OK
2024-01-29 02:38:03 I0129 01:38:03.308494   307 stats_builder.h:100] {"specversion":"1.0","type":"riva.asr.recognize.v1","source":"","subject":"","id":"dab5aef7-7b35-4b09-bc40-569cccf4a437","datacontenttype":"application/json","time":"2024-01-29T01:38:03.283636962+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"audio_duration":4.800000190734863,"speech_duration":0.0,"status":0,"err_msg":""}}

GPU: RTX 3070Ti
CPU: Intel i7-12700K
Operating System: Windows 11
Riva Version: 2.14.0

So, found my problem. It appears that there might be a bug in the RIVA ASR server implementation. When the config member “MaxAlternatives” is not passed, the documentation indicates that the server will return a maximum of 1 alternative. This does not happen though, and the server reports that a file with length 0 has been send and processed, resulting in a response without any data. When I do explicitly specify the MaxAlternatives member in the config, even if its only to 1, I get the result I expect.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.