Riva 1.8 ASR Websocket

Please provide the following information when requesting support.

Hardware - GPU (V100 [16gb])
Hardware - CPU (XENON E5-2686 V4)
Operating System -
Riva Version (1.8.0b0)
How to reproduce the issue? (This is for errors. Please share the command and the detailed log here)

Context;

Our problem occurs on the front-end while we are trying to stream microphone input via Python FAST API.

  1. Riva (1.8.0b0) backend (available, working)
  2. Using a Websocket wrapper to stream microphone input to the Riva backend
  3. Example transcribe_mic.py is working well. However, when we use MediaRecorder on the frontend we receive UNIMPLEMENTED error.

35073 ERROR api:websocket_endpoint:119 - Inference error: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = “”
debug_error_string = “{”created”:“@1641310613.864265000”,“description”:“Error received from peer ipv4:3.145.17.11:8001”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1075,“grpc_message”:“”,“grpc_status”:12}”

  1. We examined the data both incoming from Python PyAudio data and MediaRecorder data, and;

x00\x00\xfc\xff\xf8\xff\xf7\xff\xf9\xff\xfa\xff\xf8\xff\xf7\xff\xfa\xff\xfd\xff\xfe\xff\xfe\xff\xfd → PyData

\x00\x00\x00\x1cftypiso5\x00\x00\x00\x01isomiso5hlsf\x00\x00\x02Hmoov\x00\x00\x00lmvhd\x00\x00\x00\x00\xdd\xf8\xff{\xdd\xf8\xff → MediaRecorder

The way we handle the incoming data,

@app.websocket(“/ws”)
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    while True:
        data = await websocket.receive_bytes()
        ## send data to Riva backend (StreamingRecognize)

<!DOCTYPE html>
<html lang=“en”>
<head>
    <meta charset=“UTF-8">
    <title>Title</title>
</head>
<body>
    <h1>helloworld</h1>
    <script>
        var ws = new WebSocket(“ws://localhost:8000/ws”, ‘echo-protocol’);
        navigator.mediaDevices.getUserMedia({ audio: true })
            .then(stream => {
                const mediaRecorder = new MediaRecorder(stream);
                mediaRecorder.start(5000);
                mediaRecorder.addEventListener(“dataavailable”, event => {
                    console.log(“sending audio”);
                    const audioBlob = new Blob([event.data], {type: ‘audio/mp3’});
                    ws.send(audioBlob);
                    console.log(“sent audio”);
                });
            });
    </script>
</body>
</html>

Any recommendations for a work-around?

Thanks in advance!

I have a similar problem here in a Demo System, please post here if you get a solution. As a workaround we just set up Riva as Stand Alone (not included in Fast API)

Some tips I picked up when working with riva:

Make sure your audio is in the proper format - PCM 16 bit little endian. Riva does not support mp3. You need to convert it to wav. Might want to do it server-side to reduce networking overhead.

See https://docs.nvidia.com/deeplearning/riva/user-guide/docs/protobuf-api/protobuf-api-root.html#riva-proto-riva-audio-proto

Since you’re using node, streams with back pressure should allow for a efficient way to convert it.

The first few bytes would typically be 52 49 46 46 (RIFF signature for wav files).