Build Ovi on DGX Spark
I came across this thing Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation this morning and wanted to see if I could install it.
“Ovi is a veo-3 like, video+audio generation model that simultaneously generates both video and audio content from text or text+image inputs.”
I tried to follow along at Step-by-Step installation
to build on the DGX Spark, but found I had to make some slight tweaks to get it to work.
git clone https://github.com/character-ai/Ovi.git
cd Ovi
# use uv rather than virtualenv
uv venv
source .venv/bin/activate
# these are sort of cargo culty, but sometime seem needed
export TRITON_PTXAS_PATH=$(which ptxas)
export CUDA_HOME=/usr/local/cuda
# needed to add the --torch-backend flag
uv pip install torch torchvision torchaudio --torch-backend auto
# this line worked
uv pip install -r requirements.txt
# needed to add MAX_JOBS, otherwise it was using up all the CPUs
# and then running out of memory triggering to OMM killers
MAX_JOBS=4 uv pip install flash_attn --no-build-isolation
Then, I could skip to Download Weights and the rest of the steps all worked.
It’s pretty fun, it takes about 15 minutes to generate a 5 second video. So far it’s worked best for me when I give it an image to start with. When running the inference.py to generate the videos, the GPU peaked at like 50 watts and got up to like 150 F iirc.