Enabling API for jetson container AudioCraft

esteban.gallardo · May 30, 2024, 3:39pm

Hello,

I would like to use the jetson container AudioCraft in order to run a server that can offer an API. I want do do something similar of what I’ve already done with stable-diffusion (see here).

In the official tutorial I see that it’s started a Jupyter Web Server, but I don’t understand the reason why it’s not possible to have a front-end equivalent to what we have with stable-diffusion (MusicGen: see image)

So, my 2 questions:

Is it possible with the current jetson container of AudioCraft enable a server with EndPoints in a similar fashion of what is already possible with stable-diffusion?
(Optional) Is it possible to enable the MusicGen front-end in a similar fashion to stable-diffusion front-end?

Any help would be greatly appreciated.

dusty_nv · May 30, 2024, 4:26pm

@esteban.gallardo just browsing through the AudioCraft code on github, I don’t see it supporting REST APIs. It has API documentation for Python here:

Note that the jetson-containers for stable-diffusion-webui and text-generation-webui just run those projects, and the projects themselves implement the REST APIs you are using. It runs inside the container when you start those apps with the corresponding flags, but I didn’t add code to those projects implementing the REST APIs. If a project doesn’t implement it, you would need to expose it yourself (i.e. via a Python script that loaded the model and used flask/fastapi/ect to serve your desired REST endpoints)

esteban.gallardo · May 30, 2024, 4:46pm

I tried to import the libraries to run it in my Flask application. Unfortunatelly, I’ve not been able to import any successfully. I’ve spent several days trying to compiling repos, installing wheels, etc, etc… to get the whole thing working without any luck. Torch libraries are a nightmare. So far I work with Torch without CUDA available because it’s impossible to do it any other way. When I manage to install Torch with CUDA nothing else is compatible with it, torchvision not compatible, torchaudio not compatible, nothing works. You can clone the repos, compile them. Nothing works. I will wait for the future until there is good integration of the torch libraries in the Jetson AGX Orin.

dusty_nv · May 30, 2024, 4:50pm

Try building your flask application on top of the audiocraft container which already has it installed and PyTorch working. PyTorch, torchvision, torchaudio, ect do work on Jetson, you just need to have the CUDA-enabled versions installed (or build them yourself). The containers make sure the correct versions stay installed.

esteban.gallardo · May 31, 2024, 7:24pm

Inside the container it’s possible to run the web front-end I previously mentioned with:

demos/python3 musicgen_app.py --share --listen 0.0.0.0

The issue is that in order to complete a sound creation request it needs from the package ffmpeg.

I’m not experienced with dockers so what I’ve tried to include that package has failed.

On the other side I’m trying to create an endpoint in the container but I’m not able to install Flask in order to create the service.

dusty_nv · June 5, 2024, 8:55pm

Hi @esteban.gallardo , sorry for the delay - if you are still stuck on this, try running the voicecraft container instead (it is based on the audiocraft container, but also installs ffmpeg). You can still run the original audiocraft in it (or voicecraft)

esteban.gallardo · June 6, 2024, 7:38am

Yes, I’m stuck. I’ve a lot stuff to program, a tight deadline and having to face docker problems is draining a lot of time.

VoiceCraft also didn’t work.

dusty_nv · June 8, 2024, 4:21pm

@esteban.gallardo just made these fixes and rebuilt the audiocraft/voicecraft containers in commit updated audiocraft and voicecraft · dusty-nv/jetson-containers@49df6bc · GitHub

added ffmpeg to audiocraft
tried to get openai-triton working, it would not
added XFORMERS_FORCE_DISABLE_TRITON=1 to xformers
added python3 demos/musicgen_app.py --listen 0.0.0.0 to audiocraft start-up

Here are the updated container images you can pull:

dustynv/audiocraft:r36.3.0
dustynv/voicecraft:r36.3.0

esteban.gallardo · June 10, 2024, 10:17am

Thanks for your support. Unfortunatelly there are still missing libraries when trying to load and generate audio.

By any chance is there any information about how to create a custom container? I’m a front-end developer (Unity3D) but I would like to try by myself to derivate from jetson PyTorch container, in order to install audiocraft and voicecraft with CUDA support.

dusty_nv · June 14, 2024, 2:10am

@esteban.gallardo what errors did you encounter or libraries are missing? I tried running them without issue, sorry that you still have problems.

You can follow any docker tutorial to create your own dockerfile and build it, or if you want to utilize the packages already supported in jetson-containers, see here:

github.com

dusty-nv/jetson-containers/blob/master/docs/build.md

# Building Containers

The [`jetson-containers build`](/jetson-containers) command is a proxy launcher for [`jetson_containers/build.py`](/jetson_containers/build.py).  It can be run from any working directory after you clone the repo and run the installer from the [System Setup](/docs/setup.md) (you should also probably mount additional storage when building containers)

To list the packages available to build for your version of JetPack/L4T, you can use `--list-packages` and `--show-packages`:

```bash
$ jetson-containers build --list-packages       # list all packages
$ jetson-containers build --show-packages       # show all package metadata
$ jetson-containers build --show-packages ros*  # show all the ros packages
```

To build a container that includes one or more packages:

```bash
$ jetson-containers build pytorch               # build a container with just PyTorch
$ jetson-containers build pytorch jupyterlab    # build container with PyTorch and JupyterLab
```

The builder will chain together the Dockerfiles of each of packages specified, and use the result of one build stage as the base image of the next.  The initial base image defaults to one built for your version of JetPack, but can be changed by specifying the [`--base` argument](#changing-the-base-image).

This file has been truncated. show original

esteban.gallardo · June 14, 2024, 1:20pm

Thanks for the reply.

I need to be able to upload voice tracks to use the Text-To-Speech for my project. In the next video I show a step by step of how the sytem fails:

Video steps to reproduce issues

If you want to have voice audio tracks to test it you can get it from here (voice tracks).

There were other options to generate audio tracks like AllTalk that is integrated with text-generation-webui, but I’m facing a similar issue with that container too that I have explained here.

Also, in regards to Riva, I’ve done all the steps of this tutorial and it doesn’t work.

Right now I’m working with but Coqui-ai TTS without CUDA support due (it takes ages do do anything) to it’s impossible to install PyTorch with CUDA support.

It would be great to have at least one working option for Text-To-Speech for the Jetson AGX Orin.

On the other side, even though Text-To-Speech has more priority for me, I would also need in the future the possibility to generate Text-To-Audio (to create FX sounds). I haven’t seen any web front-end with API as VoiceGen or MusicGen, so it would be nice to be able to install the Python Flask so the programmers can have flexibility to implement endpoints services.

dusty_nv · June 19, 2024, 3:44am

@esteban.gallardo what I have mainly been using for text-to-speech is Piper TTS:

It is lightweight, optimized with onnxruntime+CUDA, and sounds decent with the high-quality version of their models (I typically use en_US-libritts-high and pick one of the many voices that sound good)

Riva streaming TTS has a known issue right now with the timeouts - should be resolved in the next release. I also have a container for Coqui xTTS, but even with TensorRT optimizations applied, it is still sub-realtime on AGX Orin. Also unfortunately they have ceased development I believe.

In NanoLLM and Agent Studio, I support plugins for both Riva TTS and Piper TTS. Riva I believe does still sound better, but takes more memory. So on the smaller Jetsons (like Orin Nano) will use Piper.

esteban.gallardo · June 19, 2024, 7:58am

Yes, Piper works, it’s extremely fast, but it doesn’t give me the quality I need. It’s as bad as Google’s Text-To-Speech. My project is about creating audiobooks and Piper is useless for that. Coqui-AI gives me the quality I need but it can take up to 3 minutes to synthesize a 3 sentence paragraph and I can have way longer paragraphs.

Right now I’ve plenty to program on the front-end side, but I suppose that next week I’ll try to build a container to be able to use VoiceCraft with CUDA support. Since I’m no expert in Docker I assume that I will spend one or two weeks until I have knowledge enough to make it. I would rather prefer to keep working on the front-end but I really need a decent AI text-to-speech generation.

dusty_nv · June 19, 2024, 9:40pm

OK, I built the standalone XTTS container again (with CUDA):

On AGX Orin, this gets a realtime factor between ~.92 - 1.0 in streaming mode, using TensorRT in my fork from github.com/dusty-nv/TTS

It sounds good (and the voice cloning feature works and is cool), but it is still too slow for my uses. You can slow the voice rate down a smidge to match that without noticing it too bad (normally I speed it up a bit actually, due to the verbose bot output). Regardless I need TTS way faster than realtime so it doesn’t consume the whole GPU, allowing it to run in the background alongside other models.

The Riva TTS still works in offline mode (meaning the entirety of the generated audio is returned at once, instead of streamed in chunks). And since it is fast, you can still approximate streaming with this by doing offline requests for each sentence or few sentences at a time.

The issues you mention about getting these packages to use CUDA (and not uninstall each other/ect) is why I use the containers.

esteban.gallardo · June 22, 2024, 4:11pm

Thanks a lot! I have been able to generate XTTS with CUDA.

For any future programmers this is the code that workds with the container dustynv/stts:r36.3.0

# ++ INSTALL LIBRARIES ++
# apt update
# apt install ffmpeg
# pip3 install pydub

from flask import Flask, request, jsonify
import base64
import requests
import json
import hashlib
import torch
from TTS.api import TTS
from pydub import AudioSegment
import os

app = Flask(__name__)
app.config['wav_voices'] = '/home/wav_voices/en'
deviceTTS = "cuda" if torch.cuda.is_available() else "cpu"
print ("TTS MODE["+deviceTTS+"]")
print(TTS().list_models())
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(deviceTTS)

def get_unique_id(username, length=9):
    hash_object = hashlib.sha256(username.encode())
    hash_int = int(hash_object.hexdigest(), 16)
    unique_id = hash_int % (10 ** length)
    return unique_id
        
@app.route("/ai/speech", methods=["POST"])
def speech_generation() -> bytes:
        args = request.args
        prompt = request.json
        voice = prompt["voice"]
        speech = prompt["speech"]
        language = prompt["language"]
        emotion = prompt["emotion"]
        speed = prompt["speed"]

        # Speech synthesis
        path_to_voice = app.config['wav_voices']+"/"+voice
        path_to_voice_ogg = path_to_voice + ".ogg"
        path_to_voice_wav = path_to_voice + ".wav"
        wav = None
        if os.path.exists(path_to_voice_wav) is False:
            sound_data = AudioSegment.from_ogg(path_to_voice_ogg)
            sound_data.export(path_to_voice_wav, format="wav")
        
        temp_wav_file = "temp"+str(get_unique_id(speech))+".wav"
        if (len(emotion) > 0):
            tts.tts_to_file(text=speech, speaker_wav=[path_to_voice_wav], language=language, emotion=emotion, speed=speed, file_path=temp_wav_file)
        else:
            tts.tts_to_file(text=speech, speaker_wav=[path_to_voice_wav], language=language, speed=speed, file_path=temp_wav_file)

        dataaudio = AudioSegment.from_wav(temp_wav_file).export(format="ogg")
        os.remove(temp_wav_file)
        return dataaudio

@app.route("/ai/speech/voice", methods=["POST"])
def upload_speech_voice():
        voicename = request.form.get("voice")
        voicedata = request.files.get("file")

        # If the user does not select a file, the browser submits an empty file without a filename.
        if voicedata.filename == '':
            flash('No selected file')
            return jsonify({"success": False})
            
        if voicedata:
            filename = voicename + ".ogg"
            voicedata.save(os.path.join(app.config['wav_voices'], filename))
            return jsonify({"success": True})
            
        return jsonify({"success": False})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=6000, threaded=False)

The final piece for my project to work is to be able to generate short sound FXs. I hope we can make work AudioCraft for that.

system · July 6, 2024, 4:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Container torch_tensorrt not working Jetson AGX Orin tensorrt , containers	15	113	March 12, 2025
Jetson AI Lab - Home Assistant Integration Jetson Projects generative_ai	63	11909	April 9, 2025
Jetson container package torch2trt on nano orin 8gb Developement kit Jetson Orin Nano containers	2	659	September 6, 2023
How to access USB speaker from within a docker container? Jetson AGX Orin audio , docker	4	687	April 21, 2024
Can't run llamaspeak Jetson AGX Orin generative_ai	12	510	July 7, 2024
Voice Demo Container for Jetson Xavier NX not working Jetson Xavier NX audio	11	1857	October 18, 2021
[jetson-voice] ASR/NLP/TTS for Jetson Jetson Projects	62	8764	December 10, 2023
JetPack 3.2 — L4T R28.2 Developer Preview for Jetson TX2 Jetson TX2	93	24379	April 11, 2018
Hello AI World for Jetpack 6.0 DP - Pytorch 2.1.0 Installed, Torchvision Did Not Install Jetson Orin Nano pytorch	24	1824	January 15, 2024
Install Pytorch with cuda on Jetson Orin nano Devloper Kit Jetson Orin Nano cuda , pytorch	13	2681	July 30, 2024

Enabling API for jetson container AudioCraft

Related topics