Text-generation-webui install error

Hi when I try to install ```
text-generation-webui
… I got error like below, anybody know what’s going on?

louie001@localhost:~$ jetson-containers run $(autotag text-generation-webui)
Namespace(packages=[‘text-generation-webui’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.2 JETPACK_VERSION=6.1.1 CUDA_VERSION=12.6
– Finding compatible container image for [‘text-generation-webui’]
text-generation-webui:r36.4.2-exllama
V4L2_DEVICES:

DISPLAY environmental variable is already set: “:0”

localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

Hi,

Please share the complete output log with us.
It’s not clear where the error message is in the current log.

Thanks.

Here it is:
louie001@localhost:~$ jetson-containers run $(autotag text-generation-webui)
Namespace(packages=[‘text-generation-webui’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.2 JETPACK_VERSION=6.1.1 CUDA_VERSION=12.6
– Finding compatible container image for [‘text-generation-webui’]
text-generation-webui:r36.4.2-exllama
V4L2_DEVICES: --device /dev/video0

DISPLAY environmental variable is already set: “:0”

localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

  • docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/louie001/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250113_085958 text-generation-webui:r36.4.2-exllama
    root@localhost:/#

Hi,

It looks like you have successfully entered the container already.
So you can go ahead with the next command.

https://www.jetson-ai-lab.com/tutorial_text-generation.html#how-to-start

$ cd /opt/text-generation-webui && python3 server.py \
  ...

Thanks.

Thanks for the reply. Next step won’t work that’s why I came here. seems dirctory text-generation-webui doesn’t exist.

root@localhost:/# cd /opt/text-generation-webui && python3 server.py
–model-dir=/data/models/text-generation-webui
–chat
–listen
bash: cd: /opt/text-generation-webui: No such file or directory
root@localhost:/# cd /opt
root@localhost:/opt# ls
exllamav2 nvidia wheels
root@localhost:/opt#

Hi,

The container might not be fully generated.
Could you run the below command and share the output with us?

$ jetson-containers build text-generation-webui

Thanks.

Here it is, thanks, it’s over the limit I have remove the first part:

– Response generated in 10.74 seconds, 128 tokens, 11.92 tokens/second (includes prompt eval.)
– Building container text-generation-webui:r36.4.2-llama_cpp

DOCKER_BUILDKIT=0 docker build --network=host --tag text-generation-webui:r36.4.2-llama_cpp
–file /home/louie001/jetson-containers/packages/llm/llama_cpp/Dockerfile
–build-arg BASE_IMAGE=text-generation-webui:r36.4.2-exllama
–build-arg LLAMA_CPP_VERSION=“0.3.2”
–build-arg LLAMA_CPP_BRANCH=“0.3.2”
–build-arg LLAMA_CPP_FLAGS=“-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on”
/home/louie001/jetson-containers/packages/llm/llama_cpp
2>&1 | tee /home/louie001/jetson-containers/logs/20250116_133527/build/text-generation-webui_r36.4.2-llama_cpp.txt; exit ${PIPESTATUS[0]}

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
environment-variable.

Sending build context to Docker daemon 38.91kB
Step 1/7 : ARG BASE_IMAGE
Step 2/7 : FROM ${BASE_IMAGE}
—> 6f28995442c9
Step 3/7 : ARG LLAMA_CPP_VERSION LLAMA_CPP_BRANCH LLAMA_CPP_FLAGS FORCE_BUILD=off
—> Using cache
—> 720cfd7806fc
Step 4/7 : COPY build.sh install.sh /tmp/llama_cpp/
—> Using cache
—> 597dc9900107
Step 5/7 : COPY benchmark.py /usr/local/bin/llama_cpp_benchmark.py
—> Using cache
—> e36487692f21
Step 6/7 : RUN /tmp/llama_cpp/install.sh || /tmp/llama_cpp/build.sh
—> Running in d6c6a2ac9c92

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

Submodule ‘vendor/llama.cpp’ (GitHub - ggerganov/llama.cpp: LLM inference in C/C++) registered for path ‘vendor/llama.cpp’
Cloning into ‘/opt/llama-cpp-python/vendor/llama.cpp’…
Submodule path ‘vendor/llama.cpp’: checked out ‘74d73dc85cc2057446bf63cc37ff649ae7cebd80’
Submodule ‘kompute’ (GitHub - nomic-ai/kompute: General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.) registered for path ‘vendor/llama.cpp/ggml/src/ggml-kompute/kompute’
Cloning into ‘/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-kompute/kompute’…
From GitHub - nomic-ai/kompute: General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

  • branch 4565194ed7c32d1d2efa32ceab4d3c6cae006306 → FETCH_HEAD
    Submodule path ‘vendor/llama.cpp/ggml/src/ggml-kompute/kompute’: checked out ‘4565194ed7c32d1d2efa32ceab4d3c6cae006306’
  • CMAKE_ARGS=‘-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on -DCMAKE_CUDA_ARCHITECTURES=87’

  • FORCE_CMAKE=1

  • pip3 wheel --wheel-dir=/opt/wheels --verbose ./llama-cpp-python
    Looking in indexes: jp6/cu126 index
    Processing ./llama-cpp-python
    Installing build dependencies: started
    Running command pip subprocess to install build dependencies
    Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
    Looking in indexes: jp6/cu126 index
    Collecting scikit-build-core>=0.9.2 (from scikit-build-core[pyproject]>=0.9.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/5e1/3ab7ca7c3c6dd/scikit_build_core-0.10.7-py3-none-any.whl (165 kB)
    Collecting exceptiongroup>=1.0 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/311/1b9d131c238be/exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
    Collecting packaging>=21.3 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/09a/bb1bccd265c01/packaging-24.2-py3-none-any.whl (65 kB)
    Collecting pathspec>=0.10.1 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/a0d/503e138a4c123/pathspec-0.12.1-py3-none-any.whl (31 kB)
    Collecting tomli>=1.2.2 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/cb5/5c73c5f440877/tomli-2.2.1-py3-none-any.whl (14 kB)
    Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core
    Successfully installed exceptiongroup-1.2.2 packaging-24.2 pathspec-0.12.1 scikit-build-core-0.10.7 tomli-2.2.1
    Installing build dependencies: finished with status ‘done’
    Getting requirements to build wheel: started
    Running command Getting requirements to build wheel
    Could not determine CMake version via --version, got ‘’ ‘Traceback (most recent call last):\n File “/usr/local/bin/cmake”, line 5, in \n from cmake import cmake\nModuleNotFoundError: No module named 'cmake'\n’
    Getting requirements to build wheel: finished with status ‘done’
    Installing backend dependencies: started
    Running command pip subprocess to install backend dependencies
    Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
    Looking in indexes: jp6/cu126 index
    Collecting cmake>=3.21
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/926/d91cae2ba7d2f/cmake-3.31.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (27.1 MB)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.1/27.1 MB 33.9 MB/s eta 0:00:00
    Installing collected packages: cmake
    Creating /tmp/pip-build-env-c6tnr357/normal/local/bin
    changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/ccmake to 755
    changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/cmake to 755
    changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/cpack to 755
    changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/ctest to 755
    Successfully installed cmake-3.31.4
    Installing backend dependencies: finished with status ‘done’
    Preparing metadata (pyproject.toml): started
    Running command Preparing metadata (pyproject.toml)
    *** scikit-build-core 0.10.7 using CMake 3.31.4 (metadata_wheel)
    Preparing metadata (pyproject.toml): finished with status ‘done’
    Collecting typing-extensions>=4.5.0 (from llama_cpp_python==0.3.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/04e/5ca0351e0f3f8/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
    Collecting numpy>=1.20.0 (from llama_cpp_python==0.3.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/425/0888bcb96617e/numpy-2.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.3 MB)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.3/14.3 MB 54.9 MB/s eta 0:00:00
    Collecting diskcache>=5.6.1 (from llama_cpp_python==0.3.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/5e3/1b2d5fbad117c/diskcache-5.6.3-py3-none-any.whl (45 kB)
    Collecting jinja2>=2.11.3 (from llama_cpp_python==0.3.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/aba/0f4dc9ed8013c/jinja2-3.1.5-py3-none-any.whl (134 kB)
    Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama_cpp_python==0.3.2)
    Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/38a/9ef736c01fccd/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21 kB)
    Saved ./wheels/diskcache-5.6.3-py3-none-any.whl
    Saved ./wheels/jinja2-3.1.5-py3-none-any.whl
    Saved ./wheels/numpy-2.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    Saved ./wheels/typing_extensions-4.12.2-py3-none-any.whl
    Saved ./wheels/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    Building wheels for collected packages: llama_cpp_python
    Building wheel for llama_cpp_python (pyproject.toml): started
    Running command Building wheel for llama_cpp_python (pyproject.toml)
    *** scikit-build-core 0.10.7 using CMake 3.31.4 (wheel)
    *** Configuring CMake…
    loading initial cache file /tmp/tmpx6kdk4xl/build/CMakeInit.txt
    – The C compiler identification is GNU 11.4.0
    – The CXX compiler identification is GNU 11.4.0
    – Detecting C compiler ABI info
    – Detecting C compiler ABI info - done
    – Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
    – Detecting C compile features
    – Detecting C compile features - done
    – Detecting CXX compiler ABI info
    – Detecting CXX compiler ABI info - done
    – Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
    – Detecting CXX compile features
    – Detecting CXX compile features - done
    – Found Git: /usr/bin/git (found version “2.34.1”)
    – Performing Test CMAKE_HAVE_LIBC_PTHREAD
    – Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
    – Found Threads: TRUE
    – Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
    – CMAKE_SYSTEM_PROCESSOR: aarch64
    – Found OpenMP_C: -fopenmp (found version “4.5”)
    – Found OpenMP_CXX: -fopenmp (found version “4.5”)
    – Found OpenMP: TRUE (found version “4.5”)
    – OpenMP found
    – Using llamafile
    – ARM detected
    – Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
    – Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
    – Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
    – Including CPU backend
    CMake Warning at vendor/llama.cpp/ggml/src/ggml-amx/CMakeLists.txt:106 (message):
    AMX requires x86 and gcc version > 11.0. Turning off GGML_AMX.

    – Found CUDAToolkit: /usr/local/cuda/targets/aarch64-linux/include (found version “12.6.85”)
    – CUDA Toolkit found
    – Using CUDA architectures: 87
    – The CUDA compiler identification is NVIDIA 12.6.85 with host compiler GNU 11.4.0
    – Detecting CUDA compiler ABI info
    – Detecting CUDA compiler ABI info - done
    – Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
    – Detecting CUDA compile features
    – Detecting CUDA compile features - done
    – CUDA host compiler is GNU 11.4.0

    – Including CUDA backend
    – Found CURL: /usr/lib/aarch64-linux-gnu/libcurl.so (found version “7.81.0”)
    CMake Warning (dev) at CMakeLists.txt:13 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
    Call Stack (most recent call first):
    CMakeLists.txt:80 (llama_cpp_python_install_target)
    This warning is for project developers. Use -Wno-dev to suppress it.

    CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
    Call Stack (most recent call first):
    CMakeLists.txt:80 (llama_cpp_python_install_target)
    This warning is for project developers. Use -Wno-dev to suppress it.

    CMake Warning (dev) at CMakeLists.txt:13 (install):
    Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
    Call Stack (most recent call first):
    CMakeLists.txt:81 (llama_cpp_python_install_target)
    This warning is for project developers. Use -Wno-dev to suppress it.

    CMake Warning (dev) at CMakeLists.txt:21 (install):
    Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
    Call Stack (most recent call first):
    CMakeLists.txt:81 (llama_cpp_python_install_target)
    This warning is for project developers. Use -Wno-dev to suppress it.

    – Configuring done (6.4s)
    – Generating done (0.1s)
    – Build files have been written to: /tmp/tmpx6kdk4xl/build
    *** Building project with Ninja…
    Change Dir: ‘/tmp/tmpx6kdk4xl/build’

    Run Build Command(s): /usr/local/bin/ninja -v
    [1/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-aarch64.c
    [2/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-threading.cpp
    [3/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-alloc.c
    [4/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_SHARED -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU -DGGML_USE_CUDA -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-backend-reg.cpp
    [5/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.c
    [6/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp
    [7/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-backend.cpp
    [8/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-quants.c
    [9/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp
    [10/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml.c
    [11/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/arange.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
    [12/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/acc.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
    [13/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/argsort.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
    [14/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
    [15/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
    [16/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/clamp.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
    [17/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-quants.c
    [18/111] : && /usr/bin/aarch64-linux-gnu-g++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml-base.so -o vendor/llama.cpp/ggml/src/libggml-base.so vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -Wl,-rpath,“$ORIGIN” -lm && :
    [19/111] : && /usr/bin/aarch64-linux-gnu-g++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml-cpu.so -o vendor/llama.cpp/ggml/src/ggml-cpu/libggml-cpu.so vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -Wl,-rpath,“$ORIGIN” vendor/llama.cpp/ggml/src/libggml-base.so /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.so /usr/lib/aarch64-linux-gnu/libpthread.a && :
    [20/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/conv-transpose-1d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
    [21/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/concat.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
    [22/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/count-equal.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
    [23/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/binbcast.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
    [24/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
    [25/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/diagmask.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
    [26/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/cpy.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
    [27/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/convert.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
    [28/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
    FAILED: vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
    /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(44): error: identifier “__Poly8x8_t” is undefined
    typedef __Poly8x8_t poly8x8_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(45): error: identifier “__Poly16x4_t” is undefined
    typedef __Poly16x4_t poly16x4_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(58): error: identifier “__Poly8x16_t” is undefined
    typedef __Poly8x16_t poly8x16_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(59): error: identifier “__Poly16x8_t” is undefined
    typedef __Poly16x8_t poly16x8_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(60): error: identifier “__Poly64x2_t” is undefined
    typedef __Poly64x2_t poly64x2_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(61): error: identifier “__Poly64x1_t” is undefined
    typedef __Poly64x1_t poly64x1_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(67): error: identifier “__Poly8_t” is undefined
    typedef __Poly8_t poly8_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(68): error: identifier “__Poly16_t” is undefined
    typedef __Poly16_t poly16_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(69): error: identifier “__Poly64_t” is undefined
    typedef __Poly64_t poly64_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(70): error: identifier “__Poly128_t” is undefined
    typedef __Poly128_t poly128_t;
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(828): error: identifier “__builtin_aarch64_saddlv8qi” is undefined
    return (int16x8_t) __builtin_aarch64_saddlv8qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(835): error: identifier “__builtin_aarch64_saddlv4hi” is undefined
    return (int32x4_t) __builtin_aarch64_saddlv4hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(842): error: identifier “__builtin_aarch64_saddlv2si” is undefined
    return (int64x2_t) __builtin_aarch64_saddlv2si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(849): error: identifier “__builtin_aarch64_uaddlv8qi” is undefined
    return (uint16x8_t) __builtin_aarch64_uaddlv8qi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(857): error: identifier “__builtin_aarch64_uaddlv4hi” is undefined
    return (uint32x4_t) __builtin_aarch64_uaddlv4hi ((int16x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(865): error: identifier “__builtin_aarch64_uaddlv2si” is undefined
    return (uint64x2_t) __builtin_aarch64_uaddlv2si ((int32x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(873): error: identifier “__builtin_aarch64_saddl2v16qi” is undefined
    return (int16x8_t) __builtin_aarch64_saddl2v16qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(880): error: identifier “__builtin_aarch64_saddl2v8hi” is undefined
    return (int32x4_t) __builtin_aarch64_saddl2v8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(887): error: identifier “__builtin_aarch64_saddl2v4si” is undefined
    return (int64x2_t) __builtin_aarch64_saddl2v4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(894): error: identifier “__builtin_aarch64_uaddl2v16qi” is undefined
    return (uint16x8_t) __builtin_aarch64_uaddl2v16qi ((int8x16_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(902): error: identifier “__builtin_aarch64_uaddl2v8hi” is undefined
    return (uint32x4_t) __builtin_aarch64_uaddl2v8hi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(910): error: identifier “__builtin_aarch64_uaddl2v4si” is undefined
    return (uint64x2_t) __builtin_aarch64_uaddl2v4si ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(918): error: identifier “__builtin_aarch64_saddwv8qi” is undefined
    return (int16x8_t) __builtin_aarch64_saddwv8qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(925): error: identifier “__builtin_aarch64_saddwv4hi” is undefined
    return (int32x4_t) __builtin_aarch64_saddwv4hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(932): error: identifier “__builtin_aarch64_saddwv2si” is undefined
    return (int64x2_t) __builtin_aarch64_saddwv2si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(939): error: identifier “__builtin_aarch64_uaddwv8qi” is undefined
    return (uint16x8_t) __builtin_aarch64_uaddwv8qi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(947): error: identifier “__builtin_aarch64_uaddwv4hi” is undefined
    return (uint32x4_t) __builtin_aarch64_uaddwv4hi ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(955): error: identifier “__builtin_aarch64_uaddwv2si” is undefined
    return (uint64x2_t) __builtin_aarch64_uaddwv2si ((int64x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(963): error: identifier “__builtin_aarch64_saddw2v16qi” is undefined
    return (int16x8_t) __builtin_aarch64_saddw2v16qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(970): error: identifier “__builtin_aarch64_saddw2v8hi” is undefined
    return (int32x4_t) __builtin_aarch64_saddw2v8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(977): error: identifier “__builtin_aarch64_saddw2v4si” is undefined
    return (int64x2_t) __builtin_aarch64_saddw2v4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(984): error: identifier “__builtin_aarch64_uaddw2v16qi” is undefined
    return (uint16x8_t) __builtin_aarch64_uaddw2v16qi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(992): error: identifier “__builtin_aarch64_uaddw2v8hi” is undefined
    return (uint32x4_t) __builtin_aarch64_uaddw2v8hi ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1000): error: identifier “__builtin_aarch64_uaddw2v4si” is undefined
    return (uint64x2_t) __builtin_aarch64_uaddw2v4si ((int64x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1008): error: identifier “__builtin_aarch64_shaddv8qi” is undefined
    return (int8x8_t) __builtin_aarch64_shaddv8qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1015): error: identifier “__builtin_aarch64_shaddv4hi” is undefined
    return (int16x4_t) __builtin_aarch64_shaddv4hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1022): error: identifier “__builtin_aarch64_shaddv2si” is undefined
    return (int32x2_t) __builtin_aarch64_shaddv2si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1029): error: identifier “__builtin_aarch64_uhaddv8qi” is undefined
    return (uint8x8_t) __builtin_aarch64_uhaddv8qi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1037): error: identifier “__builtin_aarch64_uhaddv4hi” is undefined
    return (uint16x4_t) __builtin_aarch64_uhaddv4hi ((int16x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1045): error: identifier “__builtin_aarch64_uhaddv2si” is undefined
    return (uint32x2_t) __builtin_aarch64_uhaddv2si ((int32x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1053): error: identifier “__builtin_aarch64_shaddv16qi” is undefined
    return (int8x16_t) __builtin_aarch64_shaddv16qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1060): error: identifier “__builtin_aarch64_shaddv8hi” is undefined
    return (int16x8_t) __builtin_aarch64_shaddv8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1067): error: identifier “__builtin_aarch64_shaddv4si” is undefined
    return (int32x4_t) __builtin_aarch64_shaddv4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1074): error: identifier “__builtin_aarch64_uhaddv16qi” is undefined
    return (uint8x16_t) __builtin_aarch64_uhaddv16qi ((int8x16_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1082): error: identifier “__builtin_aarch64_uhaddv8hi” is undefined
    return (uint16x8_t) __builtin_aarch64_uhaddv8hi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1090): error: identifier “__builtin_aarch64_uhaddv4si” is undefined
    return (uint32x4_t) __builtin_aarch64_uhaddv4si ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1098): error: identifier “__builtin_aarch64_srhaddv8qi” is undefined
    return (int8x8_t) __builtin_aarch64_srhaddv8qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1105): error: identifier “__builtin_aarch64_srhaddv4hi” is undefined
    return (int16x4_t) __builtin_aarch64_srhaddv4hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1112): error: identifier “__builtin_aarch64_srhaddv2si” is undefined
    return (int32x2_t) __builtin_aarch64_srhaddv2si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1119): error: identifier “__builtin_aarch64_urhaddv8qi” is undefined
    return (uint8x8_t) __builtin_aarch64_urhaddv8qi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1127): error: identifier “__builtin_aarch64_urhaddv4hi” is undefined
    return (uint16x4_t) __builtin_aarch64_urhaddv4hi ((int16x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1135): error: identifier “__builtin_aarch64_urhaddv2si” is undefined
    return (uint32x2_t) __builtin_aarch64_urhaddv2si ((int32x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1143): error: identifier “__builtin_aarch64_srhaddv16qi” is undefined
    return (int8x16_t) __builtin_aarch64_srhaddv16qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1150): error: identifier “__builtin_aarch64_srhaddv8hi” is undefined
    return (int16x8_t) __builtin_aarch64_srhaddv8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1157): error: identifier “__builtin_aarch64_srhaddv4si” is undefined
    return (int32x4_t) __builtin_aarch64_srhaddv4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1164): error: identifier “__builtin_aarch64_urhaddv16qi” is undefined
    return (uint8x16_t) __builtin_aarch64_urhaddv16qi ((int8x16_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1172): error: identifier “__builtin_aarch64_urhaddv8hi” is undefined
    return (uint16x8_t) __builtin_aarch64_urhaddv8hi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1180): error: identifier “__builtin_aarch64_urhaddv4si” is undefined
    return (uint32x4_t) __builtin_aarch64_urhaddv4si ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1188): error: identifier “__builtin_aarch64_addhnv8hi” is undefined
    return (int8x8_t) __builtin_aarch64_addhnv8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1195): error: identifier “__builtin_aarch64_addhnv4si” is undefined
    return (int16x4_t) __builtin_aarch64_addhnv4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1202): error: identifier “__builtin_aarch64_addhnv2di” is undefined
    return (int32x2_t) __builtin_aarch64_addhnv2di (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1209): error: identifier “__builtin_aarch64_addhnv8hi” is undefined
    return (uint8x8_t) __builtin_aarch64_addhnv8hi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1217): error: identifier “__builtin_aarch64_addhnv4si” is undefined
    return (uint16x4_t) __builtin_aarch64_addhnv4si ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1225): error: identifier “__builtin_aarch64_addhnv2di” is undefined
    return (uint32x2_t) __builtin_aarch64_addhnv2di ((int64x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1233): error: identifier “__builtin_aarch64_raddhnv8hi” is undefined
    return (int8x8_t) __builtin_aarch64_raddhnv8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1240): error: identifier “__builtin_aarch64_raddhnv4si” is undefined
    return (int16x4_t) __builtin_aarch64_raddhnv4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1247): error: identifier “__builtin_aarch64_raddhnv2di” is undefined
    return (int32x2_t) __builtin_aarch64_raddhnv2di (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1254): error: identifier “__builtin_aarch64_raddhnv8hi” is undefined
    return (uint8x8_t) __builtin_aarch64_raddhnv8hi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1262): error: identifier “__builtin_aarch64_raddhnv4si” is undefined
    return (uint16x4_t) __builtin_aarch64_raddhnv4si ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1270): error: identifier “__builtin_aarch64_raddhnv2di” is undefined
    return (uint32x2_t) __builtin_aarch64_raddhnv2di ((int64x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1278): error: identifier “__builtin_aarch64_addhn2v8hi” is undefined
    return (int8x16_t) __builtin_aarch64_addhn2v8hi (__a, __b, __c);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1285): error: identifier “__builtin_aarch64_addhn2v4si” is undefined
    return (int16x8_t) __builtin_aarch64_addhn2v4si (__a, __b, __c);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1292): error: identifier “__builtin_aarch64_addhn2v2di” is undefined
    return (int32x4_t) __builtin_aarch64_addhn2v2di (__a, __b, __c);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1299): error: identifier “__builtin_aarch64_addhn2v8hi” is undefined
    return (uint8x16_t) __builtin_aarch64_addhn2v8hi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1308): error: identifier “__builtin_aarch64_addhn2v4si” is undefined
    return (uint16x8_t) __builtin_aarch64_addhn2v4si ((int16x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1317): error: identifier “__builtin_aarch64_addhn2v2di” is undefined
    return (uint32x4_t) __builtin_aarch64_addhn2v2di ((int32x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1326): error: identifier “__builtin_aarch64_raddhn2v8hi” is undefined
    return (int8x16_t) __builtin_aarch64_raddhn2v8hi (__a, __b, __c);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1333): error: identifier “__builtin_aarch64_raddhn2v4si” is undefined
    return (int16x8_t) __builtin_aarch64_raddhn2v4si (__a, __b, __c);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1340): error: identifier “__builtin_aarch64_raddhn2v2di” is undefined
    return (int32x4_t) __builtin_aarch64_raddhn2v2di (__a, __b, __c);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1347): error: identifier “__builtin_aarch64_raddhn2v8hi” is undefined
    return (uint8x16_t) __builtin_aarch64_raddhn2v8hi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1356): error: identifier “__builtin_aarch64_raddhn2v4si” is undefined
    return (uint16x8_t) __builtin_aarch64_raddhn2v4si ((int16x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1365): error: identifier “__builtin_aarch64_raddhn2v2di” is undefined
    return (uint32x4_t) __builtin_aarch64_raddhn2v2di ((int32x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1458): error: identifier “__builtin_aarch64_pmulv8qi” is undefined
    return (poly8x8_t) __builtin_aarch64_pmulv8qi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1522): error: identifier “__builtin_aarch64_pmulv16qi” is undefined
    return (poly8x16_t) __builtin_aarch64_pmulv16qi ((int8x16_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2230): error: identifier “__builtin_aarch64_ssublv8qi” is undefined
    return (int16x8_t) __builtin_aarch64_ssublv8qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2237): error: identifier “__builtin_aarch64_ssublv4hi” is undefined
    return (int32x4_t) __builtin_aarch64_ssublv4hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2244): error: identifier “__builtin_aarch64_ssublv2si” is undefined
    return (int64x2_t) __builtin_aarch64_ssublv2si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2251): error: identifier “__builtin_aarch64_usublv8qi” is undefined
    return (uint16x8_t) __builtin_aarch64_usublv8qi ((int8x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2259): error: identifier “__builtin_aarch64_usublv4hi” is undefined
    return (uint32x4_t) __builtin_aarch64_usublv4hi ((int16x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2267): error: identifier “__builtin_aarch64_usublv2si” is undefined
    return (uint64x2_t) __builtin_aarch64_usublv2si ((int32x2_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2275): error: identifier “__builtin_aarch64_ssubl2v16qi” is undefined
    return (int16x8_t) __builtin_aarch64_ssubl2v16qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2282): error: identifier “__builtin_aarch64_ssubl2v8hi” is undefined
    return (int32x4_t) __builtin_aarch64_ssubl2v8hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2289): error: identifier “__builtin_aarch64_ssubl2v4si” is undefined
    return (int64x2_t) __builtin_aarch64_ssubl2v4si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2296): error: identifier “__builtin_aarch64_usubl2v16qi” is undefined
    return (uint16x8_t) __builtin_aarch64_usubl2v16qi ((int8x16_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2304): error: identifier “__builtin_aarch64_usubl2v8hi” is undefined
    return (uint32x4_t) __builtin_aarch64_usubl2v8hi ((int16x8_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2312): error: identifier “__builtin_aarch64_usubl2v4si” is undefined
    return (uint64x2_t) __builtin_aarch64_usubl2v4si ((int32x4_t) __a,
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2320): error: identifier “__builtin_aarch64_ssubwv8qi” is undefined
    return (int16x8_t) __builtin_aarch64_ssubwv8qi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2327): error: identifier “__builtin_aarch64_ssubwv4hi” is undefined
    return (int32x4_t) __builtin_aarch64_ssubwv4hi (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2334): error: identifier “__builtin_aarch64_ssubwv2si” is undefined
    return (int64x2_t) __builtin_aarch64_ssubwv2si (__a, __b);
    ^

    /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2341): error: identifier “__builtin_aarch64_usubwv8qi” is undefined
    return (uint16x8_t) __builtin_aarch64_usubwv8qi ((int16x8_t) __a,
    ^

    Error limit reached.
    100 errors detected in the compilation of “/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu”.
    Compilation terminated.
    [29/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/dmmv.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o
    [30/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/getrows.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
    [31/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/im2col.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
    [32/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/mmq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
    [33/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f32.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
    [34/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
    [35/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
    ninja: build stopped: subcommand failed.

    *** CMake build failed
    error: subprocess-exited-with-error

    × Building wheel for llama_cpp_python (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> See above for output.

    note: This error originates from a subprocess, and is likely not a problem with pip.
    full command: /usr/bin/python3.10 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp32pi2w7s
    cwd: /opt/llama-cpp-python
    Building wheel for llama_cpp_python (pyproject.toml): finished with status ‘error’
    ERROR: Failed building wheel for llama_cpp_python
    Failed to build llama_cpp_python
    ERROR: Failed to build one or more wheels
    The command ‘/bin/sh -c /tmp/llama_cpp/install.sh || /tmp/llama_cpp/build.sh’ returned a non-zero code: 1
    Traceback (most recent call last):
    File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
    exec(code, run_globals)
    File “/home/louie001/jetson-containers/jetson_containers/build.py”, line 112, in
    build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api, args.skip_packages)
    File “/home/louie001/jetson-containers/jetson_containers/container.py”, line 147, in build_container
    status = subprocess.run(cmd.replace(NEWLINE, ’ ‘), executable=’/bin/bash’, shell=True, check=True)
    File “/usr/lib/python3.10/subprocess.py”, line 526, in run
    raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command ‘DOCKER_BUILDKIT=0 docker build --network=host --tag text-generation-webui:r36.4.2-llama_cpp --file /home/louie001/jetson-containers/packages/llm/llama_cpp/Dockerfile --build-arg BASE_IMAGE=text-generation-webui:r36.4.2-exllama --build-arg LLAMA_CPP_VERSION=“0.3.2” --build-arg LLAMA_CPP_BRANCH=“0.3.2” --build-arg LLAMA_CPP_FLAGS=“-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on” /home/louie001/jetson-containers/packages/llm/llama_cpp 2>&1 | tee /home/louie001/jetson-containers/logs/20250116_133527/build/text-generation-webui_r36.4.2-llama_cpp.txt; exit ${PIPESTATUS[0]}’ returned non-zero exit status 1.
    louie001@localhost:~$

Hi,

We test the text-generation-webui with JetPack 6.2 and it can run correctly.
Please find the below link for the details:

Thanks.

Nvidia sucks, 99 percent of the tutorial instructions on its website are not working. either packages missing, file not found, or just not working even you followed step by step instruction they posted there!

I was facing a different issue (looks like) and i have updated all the details here, can you please suggest if there is a way to resolve the issue.

Thanks
Nataraj

Hi,

We tested it on the latest JetPack 6.2 (r36.4.3) and it can work normally.
Could you set up the environment and give it a try?

Thanks.

Hi, @natarajnvidia

Let’s discuss the question on the new topic directly.

1 Like