Hi @paelnever, try setting CUDA_ARCH_FLAG=sm_87 as either environment variable or in the Makefile:
Or try cmake and change GGML_CUDA_ARCHITECTURES. I haven’t tried building whisper.cpp, but I have dockerfiles for a few different versions of Whisper here:
Ok that edit worked to solve that single trouble, then i had to edit a couple more lines to achieve successful compile of whisper.cpp which was first step to compile talk-llama.
Trying to compile talk-llama which was my truly objective had a little trouble that solved by installing some packages following instructions here Pygame on Jetson nano - #9 by user38008
so just run sudo apt-get install libsdl2-ttf-dev libsdl2-image-dev libsdl2-mixer-dev
thx for the help. Hope i can test your dokerfiles in the future, i’m trying to find the quickest voice recognition and TTS quality models to make a voice personal assistant with orin AGX
Have you tried Riva? It has very fast and efficient ASR/TTS, and uses transformer-based models so the quality is good:
This is what I use in my llamaspeak videos. Riva isn’t out yet for JetPack 6 (it will be soon), so currently that only runs on JetPack 5. And if you find whisper.cpp to be faster/better for streaming ASR, that would be good to know thanks!