Streaming audio from Riva TTS to Audio 2 face

Please provide the following information when requesting support.

Workstation
Hardware - GPU (GeForce RTX 2070)
Hardware - CPU
Operating System - Win 10
Riva Version 2.14.0
TLT Version (if relevant)

  • I am trying to modify the script from the ngc catalog - foundation models Nemotron-3-8B no streaming.
  • sending the output through a local riva tts server to audio 2 face
    – I can see that i’m able to send the file to the riva tts server
    – I am trying to figure out how i can play the audio through my speakers. Not working at the moment
    – I am trying to figure out how i can then send this to Audio 2 Face Streaming player that i have set up. I don’t need to connect any head or mesh for now. I just need the audio to go through in to Audio 2 face. I’m having a hard time finding any reference for this.

Nemotron script

import requests

invoke_url = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/0c60f14d-46cb-465e-b994-227e1c3d5047"
fetch_url_format = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/"

headers = {
    "Authorization": "Bearer $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC",
    "Accept": "application/json",
}

payload = {
  "messages": [
    {
      "content": "The Apollo 11 mission was the first mission to land humans on the moon. The mission was launched on July 16, 1969, and Neil Armstrong and Buzz Aldrin became the first humans to walk on the moon on July 20, 1969.",
      "role": "context"
    },
    {
      "content": "What was the purpose of the Apollo 11 mission?",
      "role": "user"
    }
  ],
  "temperature": 0.2,
  "top_p": 0.7,
  "max_tokens": 1024,
  "bad": "bad",
  "stop": "city",
  "stream": False
}

# re-use connections
session = requests.Session()

response = session.post(invoke_url, headers=headers, json=payload)

while response.status_code == 202:
    request_id = response.headers.get("NVCF-REQID")
    fetch_url = fetch_url_format + request_id
    response = session.get(fetch_url, headers=headers)

response.raise_for_status()
response_body = response.json()
print(response_body)

My addition

import numpy as np
import IPython.display as ipd
import riva.client

...

auth = riva.client.Auth(uri='localhost:50052')

riva_tts = riva.client.SpeechSynthesisService(auth)

sample_rate_hz = 44100
req = { 
        "language_code"  : "en-US",
        "encoding"       : riva.client.AudioEncoding.LINEAR_PCM ,   # LINEAR_PCM and OGGOPUS encodings are supported
        "sample_rate_hz" : sample_rate_hz,                          # Generate 44.1KHz audio
        "voice_name"     : "English-US.Female-1"                    # The name of the voice to generate
}


req["text"] = "response_body"
resp = riva_tts.synthesize(**req)
audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
ipd.Audio(audio_samples, rate=sample_rate_hz)

VS Code terminal - Nemotron-3-8B output

{'id': 'fdf95293-7e7e-453f-8a3b-d99422f97f59', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'to land humans on the moon'}, 'finish_reason': 'stop'}], 'usage': {'completion_tokens': 6, 'prompt_tokens': 135, 'total_tokens': 141}}

Docker - Riva TTS Server

2024-03-08 21:38:23 I0308 20:38:23.637698   306 grpc_riva_tts.cc:465] TTSService.Synthesize returning OK
2024-03-08 21:38:23 I0308 20:38:23.638233   306 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeoffline.v1","source":"","subject":"","id":"9eaef620-b9ed-4728-87f7-b82aa2ea4f9e","datacontenttype":"application/json","time":"2024-03-08T20:38:22.895382911+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":13,"audio_duration":0.5195465087890625,"encoding":"LINEAR_PCM","status":0,"err_msg":""}}

Currently

  • Studying the test_client.py paused
  • Studying the riva_tts extension paused
  • Studying this - Audio2Face Headless and RestAPI Overview: https://www.youtube.com/watch?v=bnLz94I9mZo
  • I can run talk.py in quickstart examples with arguments to play audio paused

What i’ve learned so far:

  • I get that the headless and RestAPI might be useful later on. But for my immediate purposes i would like to try to not complecate things futher. Keepign things “simple”
  • test_client.py ← seems to provide the closest structure to what i’m hoping to set up.

So far what i’ve gathered is that it can:

  • From the file:

But in a real application such stream of chunks may be aquired from some other streaming source: streaming audio via internet, streaming Text-To-Speech, etc

What i’m trying to figure out is how it all strings together using the quickstart 2.14.0 server and client in docker + ngc foundation model

  • I send a question as a string to the Foundation model through an URL + API key Done

  • I get back a response as a text Done

  • That text is sent to the Riva TTS local docker server Done
    – Hosted on a different port than the A2F server. e.g localhost:50052 Done

  • Send the output from the Riva TTS local docker server to A2F
    – Specify where the A2F server: localhost:50051 confirmed
    – Specify where the streaming player. Which would be something like: /World/audio2face/PlayerStreaming confirmed

  • Have a open scene in A2F with a streaming player set up to recieve Done
    – Streaming audio player created
    – RivaTTS plugin enabled

Got the test_client.py to work. Which means i can send audio to A2F. Now i just need to work out how to merge the two workstreams.

Basically got this to work now.

  • Using the nvgc mistral no-stream sample code i can send

I am going to Stockholm, what should I see?

  • I limit and filter the json response
Stockholm, the capital city of Sweden, is a beautiful and culturally rich destination with many attractions to see and explore. Here are some must-visit places in Stockholm:

1. Gamla Stan: The old town of Stockholm is a charming and historic area with narrow, winding streets, colorful buildings, and many shops, restaurants, and cafes.
2. The Royal Palace: The official residence of the Swedish monarch, the Royal Palace is a stunning baroque building with over 600 rooms, three museums, and the Royal Armory.
3. Vasa Museum: This
  • I synthesize using the 2.14.0 quickstart docker server

output.wav

  • I use test_client.py to send it to A2F

Error

  • Separately they are working. But as soon as i start A2F my quickstart 2.14.0 synthesize speech won’t work. So i can’t get the output.wav
    – It just freezes when trying to synthesize(my whole computer gets super sluggish)
    – Eventually it exits out because of timeout. At which point i need to restart the quickstart 2.14.0 server and close down A2F
2024-03-12 14:12:47 I0312 13:12:47.146572   490 grpc_riva_tts.cc:312] TTSService.Synthesize called.
2024-03-12 14:12:47 I0312 13:12:47.146772   490 grpc_riva_tts.cc:339] Using multispeaker model fastpitch_hifigan_ensemble-English-US for inference with speaker_id: 0
2024-03-12 14:16:31 I0312 13:16:31.509377   490 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeoffline.v1","source":"","subject":"","id":"58d53d97-e78e-4001-af8d-18320d115113","datacontenttype":"application/json","time":"2024-03-12T13:12:47.146525451+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"","request_count":1,"total_characters":0,"audio_duration":0.0,"status":2,"err_msg":"Error: Triton model failed during inference. Error message: Streaming timed out"}}
  • I’ve changed the ports to 50052 instead of 50051 and restarted the server
  • Could it be the triton server that is giving me issues?

Solution

I can’t have the viewport running at the same time. I’m assuming i’m running out of memory. LOL. Will investigate with a more powerful machine.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.