Please provide the following information when requesting support.
Workstation
Hardware - GPU (GeForce RTX 2070)
Hardware - CPU
Operating System - Win 10
Riva Version 2.14.0
TLT Version (if relevant)
- I am trying to modify the script from the ngc catalog - foundation models Nemotron-3-8B no streaming.
- sending the output through a local riva tts server to audio 2 face
– I can see that i’m able to send the file to the riva tts server
– I am trying to figure out how i can play the audio through my speakers. Not working at the moment
– I am trying to figure out how i can then send this to Audio 2 Face Streaming player that i have set up. I don’t need to connect any head or mesh for now. I just need the audio to go through in to Audio 2 face. I’m having a hard time finding any reference for this.
Nemotron script
import requests
invoke_url = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/0c60f14d-46cb-465e-b994-227e1c3d5047"
fetch_url_format = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/"
headers = {
"Authorization": "Bearer $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC",
"Accept": "application/json",
}
payload = {
"messages": [
{
"content": "The Apollo 11 mission was the first mission to land humans on the moon. The mission was launched on July 16, 1969, and Neil Armstrong and Buzz Aldrin became the first humans to walk on the moon on July 20, 1969.",
"role": "context"
},
{
"content": "What was the purpose of the Apollo 11 mission?",
"role": "user"
}
],
"temperature": 0.2,
"top_p": 0.7,
"max_tokens": 1024,
"bad": "bad",
"stop": "city",
"stream": False
}
# re-use connections
session = requests.Session()
response = session.post(invoke_url, headers=headers, json=payload)
while response.status_code == 202:
request_id = response.headers.get("NVCF-REQID")
fetch_url = fetch_url_format + request_id
response = session.get(fetch_url, headers=headers)
response.raise_for_status()
response_body = response.json()
print(response_body)
My addition
import numpy as np
import IPython.display as ipd
import riva.client
...
auth = riva.client.Auth(uri='localhost:50052')
riva_tts = riva.client.SpeechSynthesisService(auth)
sample_rate_hz = 44100
req = {
"language_code" : "en-US",
"encoding" : riva.client.AudioEncoding.LINEAR_PCM , # LINEAR_PCM and OGGOPUS encodings are supported
"sample_rate_hz" : sample_rate_hz, # Generate 44.1KHz audio
"voice_name" : "English-US.Female-1" # The name of the voice to generate
}
req["text"] = "response_body"
resp = riva_tts.synthesize(**req)
audio_samples = np.frombuffer(resp.audio, dtype=np.int16)
ipd.Audio(audio_samples, rate=sample_rate_hz)
VS Code terminal - Nemotron-3-8B output
{'id': 'fdf95293-7e7e-453f-8a3b-d99422f97f59', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'to land humans on the moon'}, 'finish_reason': 'stop'}], 'usage': {'completion_tokens': 6, 'prompt_tokens': 135, 'total_tokens': 141}}
Docker - Riva TTS Server
2024-03-08 21:38:23 I0308 20:38:23.637698 306 grpc_riva_tts.cc:465] TTSService.Synthesize returning OK
2024-03-08 21:38:23 I0308 20:38:23.638233 306 stats_builder.h:164] {"specversion":"1.0","type":"riva.tts.synthesizeoffline.v1","source":"","subject":"","id":"9eaef620-b9ed-4728-87f7-b82aa2ea4f9e","datacontenttype":"application/json","time":"2024-03-08T20:38:22.895382911+00:00","data":{"release_version":"2.14.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"total_characters":13,"audio_duration":0.5195465087890625,"encoding":"LINEAR_PCM","status":0,"err_msg":""}}