Can't compile with cuda support

paelnever · March 2, 2024, 6:17pm

Trying to compile whisper.cpp with cuda support in Jetson Orin AGX 64gb following the instructions of their github page https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#nvidia-gpu-support I’m getting the folowing error:

WHISPER_CUBLAS=1 make -j 
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  aarch64
I UNAME_M:  aarch64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/aarch64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/aarch64-linux/include
I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/aarch64-linux/lib -L/usr/lib/wsl/lib
I CC:       cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
I CXX:      g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

nvcc --forward-unknown-to-host-compiler -arch=all -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/aarch64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal   : Value 'all' is not defined for option 'gpu-architecture'
make: *** [Makefile:233: ggml-cuda.o] Error 1

I have the whole jetpack installed (including CUDA) and running the default Ubuntu system that came already installed at the device.

dusty_nv · March 3, 2024, 3:10am

Hi @paelnever, try setting CUDA_ARCH_FLAG=sm_87 as either environment variable or in the Makefile:

github.com

ggerganov/whisper.cpp/blob/25d313b38b1f562200f915cd5952555613cd0110/Makefile#L222


      
          
          ifdef WHISPER_OPENBLAS
          	CFLAGS  += -DGGML_USE_OPENBLAS -I/usr/local/include/openblas -I/usr/include/openblas
          	LDFLAGS += -lopenblas
          endif
          
          ifdef WHISPER_CUBLAS
          	ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)
          		CUDA_ARCH_FLAG ?= native
          	else
          		CUDA_ARCH_FLAG ?= all
          	endif
          
          	CFLAGS      += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
          	CXXFLAGS    += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
          	LDFLAGS     += -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib -L/usr/lib/wsl/lib
          	WHISPER_OBJ += ggml-cuda.o
          	NVCC        = nvcc
          	NVCCFLAGS   = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)
          
          ggml-cuda.o: ggml-cuda.cu ggml-cuda.h

Or try cmake and change GGML_CUDA_ARCHITECTURES. I haven’t tried building whisper.cpp, but I have dockerfiles for a few different versions of Whisper here:

paelnever · March 6, 2024, 1:58am

Ok that edit worked to solve that single trouble, then i had to edit a couple more lines to achieve successful compile of whisper.cpp which was first step to compile talk-llama.
Trying to compile talk-llama which was my truly objective had a little trouble that solved by installing some packages following instructions here Pygame on Jetson nano - #9 by user38008
so just run sudo apt-get install libsdl2-ttf-dev libsdl2-image-dev libsdl2-mixer-dev
thx for the help. Hope i can test your dokerfiles in the future, i’m trying to find the quickest voice recognition and TTS quality models to make a voice personal assistant with orin AGX

dusty_nv · March 10, 2024, 3:14am

Have you tried Riva? It has very fast and efficient ASR/TTS, and uses transformer-based models so the quality is good:

This is what I use in my llamaspeak videos. Riva isn’t out yet for JetPack 6 (it will be soon), so currently that only runs on JetPack 5. And if you find whisper.cpp to be faster/better for streaming ASR, that would be good to know thanks!

system · March 24, 2024, 3:15am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Llamacpp compile failed on Jetson Orin Nano (8GB) Jetson Orin Nano generative_ai , llama	5	349	January 13, 2025
CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` Jetson AGX Orin cuda	5	136	March 3, 2025
Cross-compile linking libgomp library failed Jetson AGX Orin compile	2	888	August 16, 2023
Jetson AGX orin CUDA 12.4 can't parse PTX generated by Clang 17.0 Jetson AGX Orin cuda	4	209	March 21, 2024
Trouble running Llamaspeak on AGX Orin 64GB Jetson AGX Orin demos-and-tutorials , generative_ai	8	489	May 25, 2024
Failing compilation with clang and std17 on Tegra Jetson Orin NX cuda	2	36	January 20, 2025
About the NVCUVID opencv build on AGX & NX Jetson AGX Xavier opencv , cuda	2	853	July 18, 2022
Cmake cannot find CUDAToolkit Jetson AGX Xavier cuda , ubuntu	8	8328	January 4, 2023
Docker image for Jetson AGX Orin with CUDA environment Jetson AGX Orin cuda , docker , containers	5	3062	June 6, 2024
Pytorch not recognizing CUDA on AGX Jetson AGX Xavier cuda , pytorch	2	1995	October 18, 2021

Can't compile with cuda support

Related topics