Text-generation-webui install error

louie001 · January 12, 2025, 9:45pm

Hi when I try to install ```
text-generation-webui
… I got error like below, anybody know what’s going on?

louie001@localhost:~$ jetson-containers run $(autotag text-generation-webui)
Namespace(packages=[‘text-generation-webui’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.2 JETPACK_VERSION=6.1.1 CUDA_VERSION=12.6
– Finding compatible container image for [‘text-generation-webui’]
text-generation-webui:r36.4.2-exllama
V4L2_DEVICES:

DISPLAY environmental variable is already set: “:0”

localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

AastaLLL · January 13, 2025, 4:24am

Hi,

Please share the complete output log with us.
It’s not clear where the error message is in the current log.

Thanks.

louie001 · January 13, 2025, 2:01pm

Here it is:
louie001@localhost:~$ jetson-containers run $(autotag text-generation-webui)
Namespace(packages=[‘text-generation-webui’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.2 JETPACK_VERSION=6.1.1 CUDA_VERSION=12.6
– Finding compatible container image for [‘text-generation-webui’]
text-generation-webui:r36.4.2-exllama
V4L2_DEVICES: --device /dev/video0

DISPLAY environmental variable is already set: “:0”

localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/louie001/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250113_085958 text-generation-webui:r36.4.2-exllama
root@localhost:/#

AastaLLL · January 15, 2025, 5:16am

Hi,

It looks like you have successfully entered the container already.
So you can go ahead with the next command.

https://www.jetson-ai-lab.com/tutorial_text-generation.html#how-to-start

$ cd /opt/text-generation-webui && python3 server.py \
  ...

Thanks.

louie001 · January 15, 2025, 12:22pm

Thanks for the reply. Next step won’t work that’s why I came here. seems dirctory text-generation-webui doesn’t exist.

root@localhost:/# cd /opt/text-generation-webui && python3 server.py
–model-dir=/data/models/text-generation-webui
–chat
–listen
bash: cd: /opt/text-generation-webui: No such file or directory
root@localhost:/# cd /opt
root@localhost:/opt# ls
exllamav2 nvidia wheels
root@localhost:/opt#

AastaLLL · January 16, 2025, 7:47am

Hi,

The container might not be fully generated.
Could you run the below command and share the output with us?

$ jetson-containers build text-generation-webui

Thanks.

louie001 · January 16, 2025, 7:52pm

Here it is, thanks, it’s over the limit I have remove the first part:

– Response generated in 10.74 seconds, 128 tokens, 11.92 tokens/second (includes prompt eval.)
– Building container text-generation-webui:r36.4.2-llama_cpp

DOCKER_BUILDKIT=0 docker build --network=host --tag text-generation-webui:r36.4.2-llama_cpp
–file /home/louie001/jetson-containers/packages/llm/llama_cpp/Dockerfile
–build-arg BASE_IMAGE=text-generation-webui:r36.4.2-exllama
–build-arg LLAMA_CPP_VERSION=“0.3.2”
–build-arg LLAMA_CPP_BRANCH=“0.3.2”
–build-arg LLAMA_CPP_FLAGS=“-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on”
/home/louie001/jetson-containers/packages/llm/llama_cpp
2>&1 | tee /home/louie001/jetson-containers/logs/20250116_133527/build/text-generation-webui_r36.4.2-llama_cpp.txt; exit ${PIPESTATUS[0]}

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
environment-variable.

Sending build context to Docker daemon 38.91kB
Step 1/7 : ARG BASE_IMAGE
Step 2/7 : FROM ${BASE_IMAGE}
—> 6f28995442c9
Step 3/7 : ARG LLAMA_CPP_VERSION LLAMA_CPP_BRANCH LLAMA_CPP_FLAGS FORCE_BUILD=off
—> Using cache
—> 720cfd7806fc
Step 4/7 : COPY build.sh install.sh /tmp/llama_cpp/
—> Using cache
—> 597dc9900107
Step 5/7 : COPY benchmark.py /usr/local/bin/llama_cpp_benchmark.py
—> Using cache
—> e36487692f21
Step 6/7 : RUN /tmp/llama_cpp/install.sh || /tmp/llama_cpp/build.sh
—> Running in d6c6a2ac9c92

apt-get update
Get:1 Index of /ubuntu-ports jammy InRelease [270 kB]
Get:2 Index of /ubuntu-ports jammy-updates InRelease [128 kB]
Get:3 Index of /ubuntu-ports jammy-backports InRelease [127 kB]
Get:4 Index of /ubuntu-ports jammy-security InRelease [129 kB]
Get:5 Index of /ubuntu-ports jammy/restricted arm64 Packages [24.2 kB]
Get:6 Index of /ubuntu-ports jammy/main arm64 Packages [1,758 kB]
Get:7 Index of /ubuntu-ports jammy/multiverse arm64 Packages [224 kB]
Get:8 Index of /ubuntu-ports jammy/universe arm64 Packages [17.2 MB]
Get:9 Index of /ubuntu-ports jammy-updates/restricted arm64 Packages [2,932 kB]
Get:10 Index of /ubuntu-ports jammy-updates/main arm64 Packages [2,520 kB]
Get:11 Index of /ubuntu-ports jammy-updates/multiverse arm64 Packages [30.6 kB]
Get:12 Index of /ubuntu-ports jammy-updates/universe arm64 Packages [1,470 kB]
Get:13 Index of /ubuntu-ports jammy-backports/main arm64 Packages [81.0 kB]
Get:14 Index of /ubuntu-ports jammy-backports/universe arm64 Packages [31.8 kB]
Get:15 Index of /ubuntu-ports jammy-security/main arm64 Packages [2,227 kB]
Get:16 Index of /ubuntu-ports jammy-security/universe arm64 Packages [1,182 kB]
Get:17 Index of /ubuntu-ports jammy-security/restricted arm64 Packages [2,805 kB]
Get:18 Index of /ubuntu-ports jammy-security/multiverse arm64 Packages [24.2 kB]
Fetched 33.2 MB in 4s (8,395 kB/s)
Reading package lists…
apt-get install -y --no-install-recommends libcurl4-openssl-dev
Reading package lists…
Building dependency tree…
Reading state information…
Suggested packages:
libcurl4-doc libidn11-dev libkrb5-dev libldap2-dev librtmp-dev libssh2-1-dev
libssl-dev
The following NEW packages will be installed:
libcurl4-openssl-dev
0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 392 kB of archives.
After this operation, 1,661 kB of additional disk space will be used.
Get:1 Index of /ubuntu-ports jammy-updates/main arm64 libcurl4-openssl-dev arm64 7.81.0-1ubuntu1.20 [392 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 392 kB in 1s (656 kB/s)
Selecting previously unselected package libcurl4-openssl-dev:arm64.
(Reading database … 29881 files and directories currently installed.)
Preparing to unpack …/libcurl4-openssl-dev_7.81.0-1ubuntu1.20_arm64.deb …
Unpacking libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.20) …
Setting up libcurl4-openssl-dev:arm64 (7.81.0-1ubuntu1.20) …
rm -rf /var/lib/apt/lists/auxfiles /var/lib/apt/lists/lock /var/lib/apt/lists/partial /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-backports_InRelease /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-backports_main_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-backports_universe_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy_InRelease /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy_main_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy_multiverse_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy_restricted_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-security_InRelease /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-security_main_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-security_multiverse_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-security_restricted_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-security_universe_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy_universe_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-updates_InRelease /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-updates_main_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-updates_multiverse_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-updates_restricted_binary-arm64_Packages.lz4 /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_jammy-updates_universe_binary-arm64_Packages.lz4
apt-get clean
pip3 install --no-cache-dir --verbose typing-extensions uvicorn anyio starlette sse-starlette starlette-context fastapi pydantic-settings
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: jp6/cu126 index
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (4.12.2)
Collecting uvicorn
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/023/dc038422502fa/uvicorn-0.34.0-py3-none-any.whl (62 kB)
Collecting anyio
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/b50/11f270ab5eb0a/anyio-4.8.0-py3-none-any.whl (96 kB)
Collecting starlette
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/4da/ec3356fb0cb1e/starlette-0.45.2-py3-none-any.whl (71 kB)
Collecting sse-starlette
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/641/0a3d3ba0c89e7/sse_starlette-2.2.1-py3-none-any.whl (10 kB)
Collecting starlette-context
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/b14/ce373fbb6895a/starlette_context-0.3.6-py3-none-any.whl (12 kB)
Collecting fastapi
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/e92/40b29e36fa8f4/fastapi-0.115.6-py3-none-any.whl (94 kB)
Collecting pydantic-settings
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/590/be9e6e24d06db/pydantic_settings-2.7.1-py3-none-any.whl (29 kB)
Collecting click>=7.0 (from uvicorn)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/63c/132bbbed01578/click-8.1.8-py3-none-any.whl (98 kB)
Collecting h11>=0.8 (from uvicorn)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/e3f/e4ac4b851c468/h11-0.14.0-py3-none-any.whl (58 kB)
Collecting exceptiongroup>=1.0.2 (from anyio)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/311/1b9d131c238be/exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.10/dist-packages (from anyio) (3.10)
Collecting sniffio>=1.1 (from anyio)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/2f6/da418d1f1e0fd/sniffio-1.3.1-py3-none-any.whl (10 kB)
Collecting starlette
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/44c/edb2b7c77a9de/starlette-0.41.3-py3-none-any.whl (73 kB)
Collecting pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 (from fastapi)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/4dd/4e322dbe55472/pydantic-2.10.5-py3-none-any.whl (431 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/f7b/63ef50f1b690d/python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Collecting annotated-types>=0.6.0 (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/1f0/2e8b43a8fbbc3/annotated_types-0.7.0-py3-none-any.whl (13 kB)
Collecting pydantic-core==2.27.2 (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/796/9e133a6f183be/pydantic_core-2.27.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 44.5 MB/s eta 0:00:00
Installing collected packages: sniffio, python-dotenv, pydantic-core, h11, exceptiongroup, click, annotated-types, uvicorn, pydantic, anyio, starlette, pydantic-settings, starlette-context, sse-starlette, fastapi
changing mode of /usr/local/bin/dotenv to 755
changing mode of /usr/local/bin/uvicorn to 755
changing mode of /usr/local/bin/fastapi to 755
Successfully installed annotated-types-0.7.0 anyio-4.8.0 click-8.1.8 exceptiongroup-1.2.2 fastapi-0.115.6 h11-0.14.0 pydantic-2.10.5 pydantic-core-2.27.2 pydantic-settings-2.7.1 python-dotenv-1.0.1 sniffio-1.3.1 sse-starlette-2.2.1 starlette-0.41.3 starlette-context-0.3.6 uvicorn-0.34.0
‘[’ off == on ‘]’
pip3 install --no-cache-dir --verbose llama-cpp-python==0.3.2
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: jp6/cu126 index
ERROR: Could not find a version that satisfies the requirement llama-cpp-python==0.3.2 (from versions: 0.3.1, 0.3.5)
ERROR: No matching distribution found for llama-cpp-python==0.3.2
echo ‘Building llama-cpp-python 0.3.2’
cd /opt
Building llama-cpp-python 0.3.2
git clone --branch=v0.3.2 --depth=1 --recursive GitHub - abetlen/llama-cpp-python: Python bindings for llama.cpp
Cloning into ‘llama-cpp-python’…
Note: switching to ‘7ecdd944624cbd49e4af0a5ce1aa402607d58dcc’.

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

Submodule ‘vendor/llama.cpp’ (GitHub - ggerganov/llama.cpp: LLM inference in C/C++) registered for path ‘vendor/llama.cpp’
Cloning into ‘/opt/llama-cpp-python/vendor/llama.cpp’…
Submodule path ‘vendor/llama.cpp’: checked out ‘74d73dc85cc2057446bf63cc37ff649ae7cebd80’
Submodule ‘kompute’ (GitHub - nomic-ai/kompute: General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.) registered for path ‘vendor/llama.cpp/ggml/src/ggml-kompute/kompute’
Cloning into ‘/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-kompute/kompute’…
From GitHub - nomic-ai/kompute: General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

branch 4565194ed7c32d1d2efa32ceab4d3c6cae006306 → FETCH_HEAD
Submodule path ‘vendor/llama.cpp/ggml/src/ggml-kompute/kompute’: checked out ‘4565194ed7c32d1d2efa32ceab4d3c6cae006306’

CMAKE_ARGS=‘-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on -DCMAKE_CUDA_ARCHITECTURES=87’
FORCE_CMAKE=1
pip3 wheel --wheel-dir=/opt/wheels --verbose ./llama-cpp-python
Looking in indexes: jp6/cu126 index
Processing ./llama-cpp-python
Installing build dependencies: started
Running command pip subprocess to install build dependencies
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: jp6/cu126 index
Collecting scikit-build-core>=0.9.2 (from scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/5e1/3ab7ca7c3c6dd/scikit_build_core-0.10.7-py3-none-any.whl (165 kB)
Collecting exceptiongroup>=1.0 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/311/1b9d131c238be/exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Collecting packaging>=21.3 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/09a/bb1bccd265c01/packaging-24.2-py3-none-any.whl (65 kB)
Collecting pathspec>=0.10.1 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/a0d/503e138a4c123/pathspec-0.12.1-py3-none-any.whl (31 kB)
Collecting tomli>=1.2.2 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/cb5/5c73c5f440877/tomli-2.2.1-py3-none-any.whl (14 kB)
Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core
Successfully installed exceptiongroup-1.2.2 packaging-24.2 pathspec-0.12.1 scikit-build-core-0.10.7 tomli-2.2.1
Installing build dependencies: finished with status ‘done’
Getting requirements to build wheel: started
Running command Getting requirements to build wheel
Could not determine CMake version via --version, got ‘’ ‘Traceback (most recent call last):\n File “/usr/local/bin/cmake”, line 5, in \n from cmake import cmake\nModuleNotFoundError: No module named 'cmake'\n’
Getting requirements to build wheel: finished with status ‘done’
Installing backend dependencies: started
Running command pip subprocess to install backend dependencies
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: jp6/cu126 index
Collecting cmake>=3.21
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/926/d91cae2ba7d2f/cmake-3.31.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (27.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.1/27.1 MB 33.9 MB/s eta 0:00:00
Installing collected packages: cmake
Creating /tmp/pip-build-env-c6tnr357/normal/local/bin
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/ccmake to 755
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/cmake to 755
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/cpack to 755
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/ctest to 755
Successfully installed cmake-3.31.4
Installing backend dependencies: finished with status ‘done’
Preparing metadata (pyproject.toml): started
Running command Preparing metadata (pyproject.toml)
*** scikit-build-core 0.10.7 using CMake 3.31.4 (metadata_wheel)
Preparing metadata (pyproject.toml): finished with status ‘done’
Collecting typing-extensions>=4.5.0 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/04e/5ca0351e0f3f8/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting numpy>=1.20.0 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/425/0888bcb96617e/numpy-2.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.3/14.3 MB 54.9 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/5e3/1b2d5fbad117c/diskcache-5.6.3-py3-none-any.whl (45 kB)
Collecting jinja2>=2.11.3 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/aba/0f4dc9ed8013c/jinja2-3.1.5-py3-none-any.whl (134 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/38a/9ef736c01fccd/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21 kB)
Saved ./wheels/diskcache-5.6.3-py3-none-any.whl
Saved ./wheels/jinja2-3.1.5-py3-none-any.whl
Saved ./wheels/numpy-2.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Saved ./wheels/typing_extensions-4.12.2-py3-none-any.whl
Saved ./wheels/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Building wheels for collected packages: llama_cpp_python
Building wheel for llama_cpp_python (pyproject.toml): started
Running command Building wheel for llama_cpp_python (pyproject.toml)
*** scikit-build-core 0.10.7 using CMake 3.31.4 (wheel)
*** Configuring CMake…
loading initial cache file /tmp/tmpx6kdk4xl/build/CMakeInit.txt
– The C compiler identification is GNU 11.4.0
– The CXX compiler identification is GNU 11.4.0
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.34.1”)
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
– CMAKE_SYSTEM_PROCESSOR: aarch64
– Found OpenMP_C: -fopenmp (found version “4.5”)
– Found OpenMP_CXX: -fopenmp (found version “4.5”)
– Found OpenMP: TRUE (found version “4.5”)
– OpenMP found
– Using llamafile
– ARM detected
– Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
– Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
– Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
– Including CPU backend
CMake Warning at vendor/llama.cpp/ggml/src/ggml-amx/CMakeLists.txt:106 (message):
AMX requires x86 and gcc version > 11.0. Turning off GGML_AMX.

– Found CUDAToolkit: /usr/local/cuda/targets/aarch64-linux/include (found version “12.6.85”)
– CUDA Toolkit found
– Using CUDA architectures: 87
– The CUDA compiler identification is NVIDIA 12.6.85 with host compiler GNU 11.4.0
– Detecting CUDA compiler ABI info
– Detecting CUDA compiler ABI info - done
– Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
– Detecting CUDA compile features
– Detecting CUDA compile features - done
– CUDA host compiler is GNU 11.4.0

– Including CUDA backend
– Found CURL: /usr/lib/aarch64-linux-gnu/libcurl.so (found version “7.81.0”)
CMake Warning (dev) at CMakeLists.txt:13 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:80 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at CMakeLists.txt:21 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:80 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at CMakeLists.txt:13 (install):
Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:81 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at CMakeLists.txt:21 (install):
Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:81 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.

– Configuring done (6.4s)
– Generating done (0.1s)
– Build files have been written to: /tmp/tmpx6kdk4xl/build
*** Building project with Ninja…
Change Dir: ‘/tmp/tmpx6kdk4xl/build’

Run Build Command(s): /usr/local/bin/ninja -v
[1/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-aarch64.c
[2/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-threading.cpp
[3/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-alloc.c
[4/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_SHARED -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU -DGGML_USE_CUDA -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-backend-reg.cpp
[5/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.c
[6/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp
[7/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-backend.cpp
[8/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-quants.c
[9/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp
[10/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml.c
[11/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/arange.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
[12/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/acc.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
[13/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/argsort.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
[14/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
[15/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
[16/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/clamp.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
[17/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-quants.c
[18/111] : && /usr/bin/aarch64-linux-gnu-g++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml-base.so -o vendor/llama.cpp/ggml/src/libggml-base.so vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -Wl,-rpath,“$ORIGIN” -lm && :
[19/111] : && /usr/bin/aarch64-linux-gnu-g++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml-cpu.so -o vendor/llama.cpp/ggml/src/ggml-cpu/libggml-cpu.so vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -Wl,-rpath,“$ORIGIN” vendor/llama.cpp/ggml/src/libggml-base.so /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.so /usr/lib/aarch64-linux-gnu/libpthread.a && :
[20/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/conv-transpose-1d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
[21/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/concat.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
[22/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/count-equal.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
[23/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/binbcast.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
[24/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
[25/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/diagmask.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
[26/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/cpy.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
[27/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/convert.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
[28/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
FAILED: vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(44): error: identifier “__Poly8x8_t” is undefined
typedef __Poly8x8_t poly8x8_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(45): error: identifier “__Poly16x4_t” is undefined
typedef __Poly16x4_t poly16x4_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(58): error: identifier “__Poly8x16_t” is undefined
typedef __Poly8x16_t poly8x16_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(59): error: identifier “__Poly16x8_t” is undefined
typedef __Poly16x8_t poly16x8_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(60): error: identifier “__Poly64x2_t” is undefined
typedef __Poly64x2_t poly64x2_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(61): error: identifier “__Poly64x1_t” is undefined
typedef __Poly64x1_t poly64x1_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(67): error: identifier “__Poly8_t” is undefined
typedef __Poly8_t poly8_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(68): error: identifier “__Poly16_t” is undefined
typedef __Poly16_t poly16_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(69): error: identifier “__Poly64_t” is undefined
typedef __Poly64_t poly64_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(70): error: identifier “__Poly128_t” is undefined
typedef __Poly128_t poly128_t;
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(828): error: identifier “__builtin_aarch64_saddlv8qi” is undefined
return (int16x8_t) __builtin_aarch64_saddlv8qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(835): error: identifier “__builtin_aarch64_saddlv4hi” is undefined
return (int32x4_t) __builtin_aarch64_saddlv4hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(842): error: identifier “__builtin_aarch64_saddlv2si” is undefined
return (int64x2_t) __builtin_aarch64_saddlv2si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(849): error: identifier “__builtin_aarch64_uaddlv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddlv8qi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(857): error: identifier “__builtin_aarch64_uaddlv4hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddlv4hi ((int16x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(865): error: identifier “__builtin_aarch64_uaddlv2si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddlv2si ((int32x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(873): error: identifier “__builtin_aarch64_saddl2v16qi” is undefined
return (int16x8_t) __builtin_aarch64_saddl2v16qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(880): error: identifier “__builtin_aarch64_saddl2v8hi” is undefined
return (int32x4_t) __builtin_aarch64_saddl2v8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(887): error: identifier “__builtin_aarch64_saddl2v4si” is undefined
return (int64x2_t) __builtin_aarch64_saddl2v4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(894): error: identifier “__builtin_aarch64_uaddl2v16qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddl2v16qi ((int8x16_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(902): error: identifier “__builtin_aarch64_uaddl2v8hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddl2v8hi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(910): error: identifier “__builtin_aarch64_uaddl2v4si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddl2v4si ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(918): error: identifier “__builtin_aarch64_saddwv8qi” is undefined
return (int16x8_t) __builtin_aarch64_saddwv8qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(925): error: identifier “__builtin_aarch64_saddwv4hi” is undefined
return (int32x4_t) __builtin_aarch64_saddwv4hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(932): error: identifier “__builtin_aarch64_saddwv2si” is undefined
return (int64x2_t) __builtin_aarch64_saddwv2si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(939): error: identifier “__builtin_aarch64_uaddwv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddwv8qi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(947): error: identifier “__builtin_aarch64_uaddwv4hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddwv4hi ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(955): error: identifier “__builtin_aarch64_uaddwv2si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddwv2si ((int64x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(963): error: identifier “__builtin_aarch64_saddw2v16qi” is undefined
return (int16x8_t) __builtin_aarch64_saddw2v16qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(970): error: identifier “__builtin_aarch64_saddw2v8hi” is undefined
return (int32x4_t) __builtin_aarch64_saddw2v8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(977): error: identifier “__builtin_aarch64_saddw2v4si” is undefined
return (int64x2_t) __builtin_aarch64_saddw2v4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(984): error: identifier “__builtin_aarch64_uaddw2v16qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddw2v16qi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(992): error: identifier “__builtin_aarch64_uaddw2v8hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddw2v8hi ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1000): error: identifier “__builtin_aarch64_uaddw2v4si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddw2v4si ((int64x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1008): error: identifier “__builtin_aarch64_shaddv8qi” is undefined
return (int8x8_t) __builtin_aarch64_shaddv8qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1015): error: identifier “__builtin_aarch64_shaddv4hi” is undefined
return (int16x4_t) __builtin_aarch64_shaddv4hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1022): error: identifier “__builtin_aarch64_shaddv2si” is undefined
return (int32x2_t) __builtin_aarch64_shaddv2si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1029): error: identifier “__builtin_aarch64_uhaddv8qi” is undefined
return (uint8x8_t) __builtin_aarch64_uhaddv8qi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1037): error: identifier “__builtin_aarch64_uhaddv4hi” is undefined
return (uint16x4_t) __builtin_aarch64_uhaddv4hi ((int16x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1045): error: identifier “__builtin_aarch64_uhaddv2si” is undefined
return (uint32x2_t) __builtin_aarch64_uhaddv2si ((int32x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1053): error: identifier “__builtin_aarch64_shaddv16qi” is undefined
return (int8x16_t) __builtin_aarch64_shaddv16qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1060): error: identifier “__builtin_aarch64_shaddv8hi” is undefined
return (int16x8_t) __builtin_aarch64_shaddv8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1067): error: identifier “__builtin_aarch64_shaddv4si” is undefined
return (int32x4_t) __builtin_aarch64_shaddv4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1074): error: identifier “__builtin_aarch64_uhaddv16qi” is undefined
return (uint8x16_t) __builtin_aarch64_uhaddv16qi ((int8x16_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1082): error: identifier “__builtin_aarch64_uhaddv8hi” is undefined
return (uint16x8_t) __builtin_aarch64_uhaddv8hi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1090): error: identifier “__builtin_aarch64_uhaddv4si” is undefined
return (uint32x4_t) __builtin_aarch64_uhaddv4si ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1098): error: identifier “__builtin_aarch64_srhaddv8qi” is undefined
return (int8x8_t) __builtin_aarch64_srhaddv8qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1105): error: identifier “__builtin_aarch64_srhaddv4hi” is undefined
return (int16x4_t) __builtin_aarch64_srhaddv4hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1112): error: identifier “__builtin_aarch64_srhaddv2si” is undefined
return (int32x2_t) __builtin_aarch64_srhaddv2si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1119): error: identifier “__builtin_aarch64_urhaddv8qi” is undefined
return (uint8x8_t) __builtin_aarch64_urhaddv8qi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1127): error: identifier “__builtin_aarch64_urhaddv4hi” is undefined
return (uint16x4_t) __builtin_aarch64_urhaddv4hi ((int16x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1135): error: identifier “__builtin_aarch64_urhaddv2si” is undefined
return (uint32x2_t) __builtin_aarch64_urhaddv2si ((int32x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1143): error: identifier “__builtin_aarch64_srhaddv16qi” is undefined
return (int8x16_t) __builtin_aarch64_srhaddv16qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1150): error: identifier “__builtin_aarch64_srhaddv8hi” is undefined
return (int16x8_t) __builtin_aarch64_srhaddv8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1157): error: identifier “__builtin_aarch64_srhaddv4si” is undefined
return (int32x4_t) __builtin_aarch64_srhaddv4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1164): error: identifier “__builtin_aarch64_urhaddv16qi” is undefined
return (uint8x16_t) __builtin_aarch64_urhaddv16qi ((int8x16_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1172): error: identifier “__builtin_aarch64_urhaddv8hi” is undefined
return (uint16x8_t) __builtin_aarch64_urhaddv8hi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1180): error: identifier “__builtin_aarch64_urhaddv4si” is undefined
return (uint32x4_t) __builtin_aarch64_urhaddv4si ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1188): error: identifier “__builtin_aarch64_addhnv8hi” is undefined
return (int8x8_t) __builtin_aarch64_addhnv8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1195): error: identifier “__builtin_aarch64_addhnv4si” is undefined
return (int16x4_t) __builtin_aarch64_addhnv4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1202): error: identifier “__builtin_aarch64_addhnv2di” is undefined
return (int32x2_t) __builtin_aarch64_addhnv2di (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1209): error: identifier “__builtin_aarch64_addhnv8hi” is undefined
return (uint8x8_t) __builtin_aarch64_addhnv8hi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1217): error: identifier “__builtin_aarch64_addhnv4si” is undefined
return (uint16x4_t) __builtin_aarch64_addhnv4si ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1225): error: identifier “__builtin_aarch64_addhnv2di” is undefined
return (uint32x2_t) __builtin_aarch64_addhnv2di ((int64x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1233): error: identifier “__builtin_aarch64_raddhnv8hi” is undefined
return (int8x8_t) __builtin_aarch64_raddhnv8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1240): error: identifier “__builtin_aarch64_raddhnv4si” is undefined
return (int16x4_t) __builtin_aarch64_raddhnv4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1247): error: identifier “__builtin_aarch64_raddhnv2di” is undefined
return (int32x2_t) __builtin_aarch64_raddhnv2di (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1254): error: identifier “__builtin_aarch64_raddhnv8hi” is undefined
return (uint8x8_t) __builtin_aarch64_raddhnv8hi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1262): error: identifier “__builtin_aarch64_raddhnv4si” is undefined
return (uint16x4_t) __builtin_aarch64_raddhnv4si ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1270): error: identifier “__builtin_aarch64_raddhnv2di” is undefined
return (uint32x2_t) __builtin_aarch64_raddhnv2di ((int64x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1278): error: identifier “__builtin_aarch64_addhn2v8hi” is undefined
return (int8x16_t) __builtin_aarch64_addhn2v8hi (__a, __b, __c);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1285): error: identifier “__builtin_aarch64_addhn2v4si” is undefined
return (int16x8_t) __builtin_aarch64_addhn2v4si (__a, __b, __c);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1292): error: identifier “__builtin_aarch64_addhn2v2di” is undefined
return (int32x4_t) __builtin_aarch64_addhn2v2di (__a, __b, __c);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1299): error: identifier “__builtin_aarch64_addhn2v8hi” is undefined
return (uint8x16_t) __builtin_aarch64_addhn2v8hi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1308): error: identifier “__builtin_aarch64_addhn2v4si” is undefined
return (uint16x8_t) __builtin_aarch64_addhn2v4si ((int16x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1317): error: identifier “__builtin_aarch64_addhn2v2di” is undefined
return (uint32x4_t) __builtin_aarch64_addhn2v2di ((int32x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1326): error: identifier “__builtin_aarch64_raddhn2v8hi” is undefined
return (int8x16_t) __builtin_aarch64_raddhn2v8hi (__a, __b, __c);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1333): error: identifier “__builtin_aarch64_raddhn2v4si” is undefined
return (int16x8_t) __builtin_aarch64_raddhn2v4si (__a, __b, __c);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1340): error: identifier “__builtin_aarch64_raddhn2v2di” is undefined
return (int32x4_t) __builtin_aarch64_raddhn2v2di (__a, __b, __c);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1347): error: identifier “__builtin_aarch64_raddhn2v8hi” is undefined
return (uint8x16_t) __builtin_aarch64_raddhn2v8hi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1356): error: identifier “__builtin_aarch64_raddhn2v4si” is undefined
return (uint16x8_t) __builtin_aarch64_raddhn2v4si ((int16x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1365): error: identifier “__builtin_aarch64_raddhn2v2di” is undefined
return (uint32x4_t) __builtin_aarch64_raddhn2v2di ((int32x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1458): error: identifier “__builtin_aarch64_pmulv8qi” is undefined
return (poly8x8_t) __builtin_aarch64_pmulv8qi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1522): error: identifier “__builtin_aarch64_pmulv16qi” is undefined
return (poly8x16_t) __builtin_aarch64_pmulv16qi ((int8x16_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2230): error: identifier “__builtin_aarch64_ssublv8qi” is undefined
return (int16x8_t) __builtin_aarch64_ssublv8qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2237): error: identifier “__builtin_aarch64_ssublv4hi” is undefined
return (int32x4_t) __builtin_aarch64_ssublv4hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2244): error: identifier “__builtin_aarch64_ssublv2si” is undefined
return (int64x2_t) __builtin_aarch64_ssublv2si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2251): error: identifier “__builtin_aarch64_usublv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_usublv8qi ((int8x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2259): error: identifier “__builtin_aarch64_usublv4hi” is undefined
return (uint32x4_t) __builtin_aarch64_usublv4hi ((int16x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2267): error: identifier “__builtin_aarch64_usublv2si” is undefined
return (uint64x2_t) __builtin_aarch64_usublv2si ((int32x2_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2275): error: identifier “__builtin_aarch64_ssubl2v16qi” is undefined
return (int16x8_t) __builtin_aarch64_ssubl2v16qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2282): error: identifier “__builtin_aarch64_ssubl2v8hi” is undefined
return (int32x4_t) __builtin_aarch64_ssubl2v8hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2289): error: identifier “__builtin_aarch64_ssubl2v4si” is undefined
return (int64x2_t) __builtin_aarch64_ssubl2v4si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2296): error: identifier “__builtin_aarch64_usubl2v16qi” is undefined
return (uint16x8_t) __builtin_aarch64_usubl2v16qi ((int8x16_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2304): error: identifier “__builtin_aarch64_usubl2v8hi” is undefined
return (uint32x4_t) __builtin_aarch64_usubl2v8hi ((int16x8_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2312): error: identifier “__builtin_aarch64_usubl2v4si” is undefined
return (uint64x2_t) __builtin_aarch64_usubl2v4si ((int32x4_t) __a,
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2320): error: identifier “__builtin_aarch64_ssubwv8qi” is undefined
return (int16x8_t) __builtin_aarch64_ssubwv8qi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2327): error: identifier “__builtin_aarch64_ssubwv4hi” is undefined
return (int32x4_t) __builtin_aarch64_ssubwv4hi (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2334): error: identifier “__builtin_aarch64_ssubwv2si” is undefined
return (int64x2_t) __builtin_aarch64_ssubwv2si (__a, __b);
^

/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2341): error: identifier “__builtin_aarch64_usubwv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_usubwv8qi ((int16x8_t) __a,
^

Error limit reached.
100 errors detected in the compilation of “/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu”.
Compilation terminated.
[29/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/dmmv.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o
[30/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/getrows.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
[31/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/im2col.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
[32/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/mmq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
[33/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f32.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
[34/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
[35/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
ninja: build stopped: subcommand failed.

*** CMake build failed
error: subprocess-exited-with-error

× Building wheel for llama_cpp_python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /usr/bin/python3.10 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp32pi2w7s
cwd: /opt/llama-cpp-python
Building wheel for llama_cpp_python (pyproject.toml): finished with status ‘error’
ERROR: Failed building wheel for llama_cpp_python
Failed to build llama_cpp_python
ERROR: Failed to build one or more wheels
The command ‘/bin/sh -c /tmp/llama_cpp/install.sh || /tmp/llama_cpp/build.sh’ returned a non-zero code: 1
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/louie001/jetson-containers/jetson_containers/build.py”, line 112, in
build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api, args.skip_packages)
File “/home/louie001/jetson-containers/jetson_containers/container.py”, line 147, in build_container
status = subprocess.run(cmd.replace(NEWLINE, ’ ‘), executable=’/bin/bash’, shell=True, check=True)
File “/usr/lib/python3.10/subprocess.py”, line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘DOCKER_BUILDKIT=0 docker build --network=host --tag text-generation-webui:r36.4.2-llama_cpp --file /home/louie001/jetson-containers/packages/llm/llama_cpp/Dockerfile --build-arg BASE_IMAGE=text-generation-webui:r36.4.2-exllama --build-arg LLAMA_CPP_VERSION=“0.3.2” --build-arg LLAMA_CPP_BRANCH=“0.3.2” --build-arg LLAMA_CPP_FLAGS=“-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on” /home/louie001/jetson-containers/packages/llm/llama_cpp 2>&1 | tee /home/louie001/jetson-containers/logs/20250116_133527/build/text-generation-webui_r36.4.2-llama_cpp.txt; exit ${PIPESTATUS[0]}’ returned non-zero exit status 1.
louie001@localhost:~$

AastaLLL · January 23, 2025, 8:51am

Hi,

We test the text-generation-webui with JetPack 6.2 and it can run correctly.
Please find the below link for the details:

Thanks.

louie001 · February 5, 2025, 4:14pm

Nvidia sucks, 99 percent of the tutorial instructions on its website are not working. either packages missing, file not found, or just not working even you followed step by step instruction they posted there!

natarajnvidia · February 9, 2025, 10:37pm

I was facing a different issue (looks like) and i have updated all the details here, can you please suggest if there is a way to resolve the issue.

Thanks
Nataraj

AastaLLL · February 10, 2025, 7:16am

Hi,

We tested it on the latest JetPack 6.2 (r36.4.3) and it can work normally.
Could you set up the environment and give it a try?

Thanks.

AastaLLL · February 10, 2025, 7:18am

Hi, @natarajnvidia

Let’s discuss the question on the new topic directly.

Topic		Replies	Views
Cannot successfully build text-generation-webui Jetson Orin Nano generative_ai	7	90	June 24, 2025
Text-generation-webui revisited Jetson Orin Nano generative_ai	5	63	June 4, 2025
Jetson Orin Nano restarts while running building text-generation-webui (Loading exllamav2_ext extension (JIT)...) Jetson Orin Nano containers , generative_ai	10	107	March 12, 2025
Issue with launching web server: Tutorial - text-generation-webui Jetson Nano generative_ai	8	118	April 28, 2025
No compatible text-generation-webui Jetson Orin Nano cublas , generative_ai , llama	4	45	June 10, 2025
Couldn't find a compatible container for text-generation-webui Jetson AGX Orin containers , generative_ai	11	247	January 23, 2025
Jetson AI Lab - ML DevOps, Containers, Core Inferencing Jetson Projects docker-machine-learning , generative_ai	18	1845	January 21, 2025
Problems with "Tutorial - text-generation-webui" Jetson Orin Nano generative_ai	6	374	February 24, 2025
Starting text-generation-webui from jetson-containers on Jetson Orin Nano throws error Jetson Orin Nano docker , containers , generative_ai	4	789	March 19, 2024
Unable to build stable-diffusion-webui Jetson AGX Orin generative_ai	25	694	February 2, 2025

Text-generation-webui install error

DISPLAY environmental variable is already set: “:0”

DISPLAY environmental variable is already set: “:0”

Related topics