pip3 wheel --wheel-dir=/opt/wheels --verbose ./llama-cpp-python
Looking in indexes: jp6/cu126 index
Processing ./llama-cpp-python
Installing build dependencies: started
Running command pip subprocess to install build dependencies
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: jp6/cu126 index
Collecting scikit-build-core>=0.9.2 (from scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/5e1/3ab7ca7c3c6dd/scikit_build_core-0.10.7-py3-none-any.whl (165 kB)
Collecting exceptiongroup>=1.0 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/311/1b9d131c238be/exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Collecting packaging>=21.3 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/09a/bb1bccd265c01/packaging-24.2-py3-none-any.whl (65 kB)
Collecting pathspec>=0.10.1 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/a0d/503e138a4c123/pathspec-0.12.1-py3-none-any.whl (31 kB)
Collecting tomli>=1.2.2 (from scikit-build-core>=0.9.2->scikit-build-core[pyproject]>=0.9.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/cb5/5c73c5f440877/tomli-2.2.1-py3-none-any.whl (14 kB)
Installing collected packages: tomli, pathspec, packaging, exceptiongroup, scikit-build-core
Successfully installed exceptiongroup-1.2.2 packaging-24.2 pathspec-0.12.1 scikit-build-core-0.10.7 tomli-2.2.1
Installing build dependencies: finished with status ‘done’
Getting requirements to build wheel: started
Running command Getting requirements to build wheel
Could not determine CMake version via --version, got ‘’ ‘Traceback (most recent call last):\n File “/usr/local/bin/cmake”, line 5, in \n from cmake import cmake\nModuleNotFoundError: No module named 'cmake'\n’
Getting requirements to build wheel: finished with status ‘done’
Installing backend dependencies: started
Running command pip subprocess to install backend dependencies
Using pip 24.3.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: jp6/cu126 index
Collecting cmake>=3.21
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/926/d91cae2ba7d2f/cmake-3.31.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (27.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.1/27.1 MB 33.9 MB/s eta 0:00:00
Installing collected packages: cmake
Creating /tmp/pip-build-env-c6tnr357/normal/local/bin
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/ccmake to 755
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/cmake to 755
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/cpack to 755
changing mode of /tmp/pip-build-env-c6tnr357/normal/local/bin/ctest to 755
Successfully installed cmake-3.31.4
Installing backend dependencies: finished with status ‘done’
Preparing metadata (pyproject.toml): started
Running command Preparing metadata (pyproject.toml)
*** scikit-build-core 0.10.7 using CMake 3.31.4 (metadata_wheel)
Preparing metadata (pyproject.toml): finished with status ‘done’
Collecting typing-extensions>=4.5.0 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/04e/5ca0351e0f3f8/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting numpy>=1.20.0 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/425/0888bcb96617e/numpy-2.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.3/14.3 MB 54.9 MB/s eta 0:00:00
Collecting diskcache>=5.6.1 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/5e3/1b2d5fbad117c/diskcache-5.6.3-py3-none-any.whl (45 kB)
Collecting jinja2>=2.11.3 (from llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/aba/0f4dc9ed8013c/jinja2-3.1.5-py3-none-any.whl (134 kB)
Collecting MarkupSafe>=2.0 (from jinja2>=2.11.3->llama_cpp_python==0.3.2)
Downloading https://pypi.jetson-ai-lab.dev/root/pypi/%2Bf/38a/9ef736c01fccd/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21 kB)
Saved ./wheels/diskcache-5.6.3-py3-none-any.whl
Saved ./wheels/jinja2-3.1.5-py3-none-any.whl
Saved ./wheels/numpy-2.2.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Saved ./wheels/typing_extensions-4.12.2-py3-none-any.whl
Saved ./wheels/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Building wheels for collected packages: llama_cpp_python
Building wheel for llama_cpp_python (pyproject.toml): started
Running command Building wheel for llama_cpp_python (pyproject.toml)
*** scikit-build-core 0.10.7 using CMake 3.31.4 (wheel)
*** Configuring CMake…
loading initial cache file /tmp/tmpx6kdk4xl/build/CMakeInit.txt
– The C compiler identification is GNU 11.4.0
– The CXX compiler identification is GNU 11.4.0
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
– Detecting C compile features
– Detecting C compile features - done
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
– Detecting CXX compile features
– Detecting CXX compile features - done
– Found Git: /usr/bin/git (found version “2.34.1”)
– Performing Test CMAKE_HAVE_LIBC_PTHREAD
– Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
– Found Threads: TRUE
– Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
– CMAKE_SYSTEM_PROCESSOR: aarch64
– Found OpenMP_C: -fopenmp (found version “4.5”)
– Found OpenMP_CXX: -fopenmp (found version “4.5”)
– Found OpenMP: TRUE (found version “4.5”)
– OpenMP found
– Using llamafile
– ARM detected
– Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
– Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
– Using runtime weight conversion of Q4_0 to Q4_0_x_x to enable optimized GEMM/GEMV kernels
– Including CPU backend
CMake Warning at vendor/llama.cpp/ggml/src/ggml-amx/CMakeLists.txt:106 (message):
AMX requires x86 and gcc version > 11.0. Turning off GGML_AMX.
– Found CUDAToolkit: /usr/local/cuda/targets/aarch64-linux/include (found version “12.6.85”)
– CUDA Toolkit found
– Using CUDA architectures: 87
– The CUDA compiler identification is NVIDIA 12.6.85 with host compiler GNU 11.4.0
– Detecting CUDA compiler ABI info
– Detecting CUDA compiler ABI info - done
– Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
– Detecting CUDA compile features
– Detecting CUDA compile features - done
– CUDA host compiler is GNU 11.4.0
– Including CUDA backend
– Found CURL: /usr/lib/aarch64-linux-gnu/libcurl.so (found version “7.81.0”)
CMake Warning (dev) at CMakeLists.txt:13 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:80 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:21 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:80 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:13 (install):
Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:81 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:21 (install):
Target ggml has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
Call Stack (most recent call first):
CMakeLists.txt:81 (llama_cpp_python_install_target)
This warning is for project developers. Use -Wno-dev to suppress it.
– Configuring done (6.4s)
– Generating done (0.1s)
– Build files have been written to: /tmp/tmpx6kdk4xl/build
*** Building project with Ninja…
Change Dir: ‘/tmp/tmpx6kdk4xl/build’
Run Build Command(s): /usr/local/bin/ninja -v
[1/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-aarch64.c
[2/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-threading.cpp
[3/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-alloc.c
[4/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_SHARED -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU -DGGML_USE_CUDA -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-backend-reg.cpp
[5/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.c
[6/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.cpp
[7/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-backend.cpp
[8/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-quants.c
[9/111] /usr/bin/aarch64-linux-gnu-g++ -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-array-bounds -Wextra-semi -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp
[10/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml.c
[11/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/arange.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
[12/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/acc.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
[13/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/argsort.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
[14/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/argmax.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
[15/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DGGML_USE_CPU_AARCH64 -DGGML_USE_LLAMAFILE -DGGML_USE_OPENMP -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cpu_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -fopenmp -MD -MT vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o -MF vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o.d -o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c
[16/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/clamp.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
[17/111] /usr/bin/aarch64-linux-gnu-gcc -DGGML_BUILD -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_base_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wdouble-promotion -MD -MT vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -MF vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o.d -o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-quants.c
[18/111] : && /usr/bin/aarch64-linux-gnu-g++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml-base.so -o vendor/llama.cpp/ggml/src/libggml-base.so vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o vendor/llama.cpp/ggml/src/CMakeFiles/ggml-base.dir/ggml-aarch64.c.o -Wl,-rpath,“$ORIGIN” -lm && :
[19/111] : && /usr/bin/aarch64-linux-gnu-g++ -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libggml-cpu.so -o vendor/llama.cpp/ggml/src/ggml-cpu/libggml-cpu.so vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu.cpp.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-aarch64.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/ggml-cpu-quants.c.o vendor/llama.cpp/ggml/src/ggml-cpu/CMakeFiles/ggml-cpu.dir/llamafile/sgemm.cpp.o -Wl,-rpath,“$ORIGIN” vendor/llama.cpp/ggml/src/libggml-base.so /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.so /usr/lib/aarch64-linux-gnu/libpthread.a && :
[20/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/conv-transpose-1d.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
[21/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/concat.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
[22/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/count-equal.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
[23/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/binbcast.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
[24/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/cross-entropy-loss.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
[25/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/diagmask.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
[26/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/cpy.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
[27/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/convert.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
[28/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
FAILED: vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(44): error: identifier “__Poly8x8_t” is undefined
typedef __Poly8x8_t poly8x8_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(45): error: identifier “__Poly16x4_t” is undefined
typedef __Poly16x4_t poly16x4_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(58): error: identifier “__Poly8x16_t” is undefined
typedef __Poly8x16_t poly8x16_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(59): error: identifier “__Poly16x8_t” is undefined
typedef __Poly16x8_t poly16x8_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(60): error: identifier “__Poly64x2_t” is undefined
typedef __Poly64x2_t poly64x2_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(61): error: identifier “__Poly64x1_t” is undefined
typedef __Poly64x1_t poly64x1_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(67): error: identifier “__Poly8_t” is undefined
typedef __Poly8_t poly8_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(68): error: identifier “__Poly16_t” is undefined
typedef __Poly16_t poly16_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(69): error: identifier “__Poly64_t” is undefined
typedef __Poly64_t poly64_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(70): error: identifier “__Poly128_t” is undefined
typedef __Poly128_t poly128_t;
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(828): error: identifier “__builtin_aarch64_saddlv8qi” is undefined
return (int16x8_t) __builtin_aarch64_saddlv8qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(835): error: identifier “__builtin_aarch64_saddlv4hi” is undefined
return (int32x4_t) __builtin_aarch64_saddlv4hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(842): error: identifier “__builtin_aarch64_saddlv2si” is undefined
return (int64x2_t) __builtin_aarch64_saddlv2si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(849): error: identifier “__builtin_aarch64_uaddlv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddlv8qi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(857): error: identifier “__builtin_aarch64_uaddlv4hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddlv4hi ((int16x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(865): error: identifier “__builtin_aarch64_uaddlv2si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddlv2si ((int32x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(873): error: identifier “__builtin_aarch64_saddl2v16qi” is undefined
return (int16x8_t) __builtin_aarch64_saddl2v16qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(880): error: identifier “__builtin_aarch64_saddl2v8hi” is undefined
return (int32x4_t) __builtin_aarch64_saddl2v8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(887): error: identifier “__builtin_aarch64_saddl2v4si” is undefined
return (int64x2_t) __builtin_aarch64_saddl2v4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(894): error: identifier “__builtin_aarch64_uaddl2v16qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddl2v16qi ((int8x16_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(902): error: identifier “__builtin_aarch64_uaddl2v8hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddl2v8hi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(910): error: identifier “__builtin_aarch64_uaddl2v4si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddl2v4si ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(918): error: identifier “__builtin_aarch64_saddwv8qi” is undefined
return (int16x8_t) __builtin_aarch64_saddwv8qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(925): error: identifier “__builtin_aarch64_saddwv4hi” is undefined
return (int32x4_t) __builtin_aarch64_saddwv4hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(932): error: identifier “__builtin_aarch64_saddwv2si” is undefined
return (int64x2_t) __builtin_aarch64_saddwv2si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(939): error: identifier “__builtin_aarch64_uaddwv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddwv8qi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(947): error: identifier “__builtin_aarch64_uaddwv4hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddwv4hi ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(955): error: identifier “__builtin_aarch64_uaddwv2si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddwv2si ((int64x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(963): error: identifier “__builtin_aarch64_saddw2v16qi” is undefined
return (int16x8_t) __builtin_aarch64_saddw2v16qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(970): error: identifier “__builtin_aarch64_saddw2v8hi” is undefined
return (int32x4_t) __builtin_aarch64_saddw2v8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(977): error: identifier “__builtin_aarch64_saddw2v4si” is undefined
return (int64x2_t) __builtin_aarch64_saddw2v4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(984): error: identifier “__builtin_aarch64_uaddw2v16qi” is undefined
return (uint16x8_t) __builtin_aarch64_uaddw2v16qi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(992): error: identifier “__builtin_aarch64_uaddw2v8hi” is undefined
return (uint32x4_t) __builtin_aarch64_uaddw2v8hi ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1000): error: identifier “__builtin_aarch64_uaddw2v4si” is undefined
return (uint64x2_t) __builtin_aarch64_uaddw2v4si ((int64x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1008): error: identifier “__builtin_aarch64_shaddv8qi” is undefined
return (int8x8_t) __builtin_aarch64_shaddv8qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1015): error: identifier “__builtin_aarch64_shaddv4hi” is undefined
return (int16x4_t) __builtin_aarch64_shaddv4hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1022): error: identifier “__builtin_aarch64_shaddv2si” is undefined
return (int32x2_t) __builtin_aarch64_shaddv2si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1029): error: identifier “__builtin_aarch64_uhaddv8qi” is undefined
return (uint8x8_t) __builtin_aarch64_uhaddv8qi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1037): error: identifier “__builtin_aarch64_uhaddv4hi” is undefined
return (uint16x4_t) __builtin_aarch64_uhaddv4hi ((int16x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1045): error: identifier “__builtin_aarch64_uhaddv2si” is undefined
return (uint32x2_t) __builtin_aarch64_uhaddv2si ((int32x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1053): error: identifier “__builtin_aarch64_shaddv16qi” is undefined
return (int8x16_t) __builtin_aarch64_shaddv16qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1060): error: identifier “__builtin_aarch64_shaddv8hi” is undefined
return (int16x8_t) __builtin_aarch64_shaddv8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1067): error: identifier “__builtin_aarch64_shaddv4si” is undefined
return (int32x4_t) __builtin_aarch64_shaddv4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1074): error: identifier “__builtin_aarch64_uhaddv16qi” is undefined
return (uint8x16_t) __builtin_aarch64_uhaddv16qi ((int8x16_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1082): error: identifier “__builtin_aarch64_uhaddv8hi” is undefined
return (uint16x8_t) __builtin_aarch64_uhaddv8hi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1090): error: identifier “__builtin_aarch64_uhaddv4si” is undefined
return (uint32x4_t) __builtin_aarch64_uhaddv4si ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1098): error: identifier “__builtin_aarch64_srhaddv8qi” is undefined
return (int8x8_t) __builtin_aarch64_srhaddv8qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1105): error: identifier “__builtin_aarch64_srhaddv4hi” is undefined
return (int16x4_t) __builtin_aarch64_srhaddv4hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1112): error: identifier “__builtin_aarch64_srhaddv2si” is undefined
return (int32x2_t) __builtin_aarch64_srhaddv2si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1119): error: identifier “__builtin_aarch64_urhaddv8qi” is undefined
return (uint8x8_t) __builtin_aarch64_urhaddv8qi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1127): error: identifier “__builtin_aarch64_urhaddv4hi” is undefined
return (uint16x4_t) __builtin_aarch64_urhaddv4hi ((int16x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1135): error: identifier “__builtin_aarch64_urhaddv2si” is undefined
return (uint32x2_t) __builtin_aarch64_urhaddv2si ((int32x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1143): error: identifier “__builtin_aarch64_srhaddv16qi” is undefined
return (int8x16_t) __builtin_aarch64_srhaddv16qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1150): error: identifier “__builtin_aarch64_srhaddv8hi” is undefined
return (int16x8_t) __builtin_aarch64_srhaddv8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1157): error: identifier “__builtin_aarch64_srhaddv4si” is undefined
return (int32x4_t) __builtin_aarch64_srhaddv4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1164): error: identifier “__builtin_aarch64_urhaddv16qi” is undefined
return (uint8x16_t) __builtin_aarch64_urhaddv16qi ((int8x16_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1172): error: identifier “__builtin_aarch64_urhaddv8hi” is undefined
return (uint16x8_t) __builtin_aarch64_urhaddv8hi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1180): error: identifier “__builtin_aarch64_urhaddv4si” is undefined
return (uint32x4_t) __builtin_aarch64_urhaddv4si ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1188): error: identifier “__builtin_aarch64_addhnv8hi” is undefined
return (int8x8_t) __builtin_aarch64_addhnv8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1195): error: identifier “__builtin_aarch64_addhnv4si” is undefined
return (int16x4_t) __builtin_aarch64_addhnv4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1202): error: identifier “__builtin_aarch64_addhnv2di” is undefined
return (int32x2_t) __builtin_aarch64_addhnv2di (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1209): error: identifier “__builtin_aarch64_addhnv8hi” is undefined
return (uint8x8_t) __builtin_aarch64_addhnv8hi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1217): error: identifier “__builtin_aarch64_addhnv4si” is undefined
return (uint16x4_t) __builtin_aarch64_addhnv4si ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1225): error: identifier “__builtin_aarch64_addhnv2di” is undefined
return (uint32x2_t) __builtin_aarch64_addhnv2di ((int64x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1233): error: identifier “__builtin_aarch64_raddhnv8hi” is undefined
return (int8x8_t) __builtin_aarch64_raddhnv8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1240): error: identifier “__builtin_aarch64_raddhnv4si” is undefined
return (int16x4_t) __builtin_aarch64_raddhnv4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1247): error: identifier “__builtin_aarch64_raddhnv2di” is undefined
return (int32x2_t) __builtin_aarch64_raddhnv2di (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1254): error: identifier “__builtin_aarch64_raddhnv8hi” is undefined
return (uint8x8_t) __builtin_aarch64_raddhnv8hi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1262): error: identifier “__builtin_aarch64_raddhnv4si” is undefined
return (uint16x4_t) __builtin_aarch64_raddhnv4si ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1270): error: identifier “__builtin_aarch64_raddhnv2di” is undefined
return (uint32x2_t) __builtin_aarch64_raddhnv2di ((int64x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1278): error: identifier “__builtin_aarch64_addhn2v8hi” is undefined
return (int8x16_t) __builtin_aarch64_addhn2v8hi (__a, __b, __c);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1285): error: identifier “__builtin_aarch64_addhn2v4si” is undefined
return (int16x8_t) __builtin_aarch64_addhn2v4si (__a, __b, __c);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1292): error: identifier “__builtin_aarch64_addhn2v2di” is undefined
return (int32x4_t) __builtin_aarch64_addhn2v2di (__a, __b, __c);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1299): error: identifier “__builtin_aarch64_addhn2v8hi” is undefined
return (uint8x16_t) __builtin_aarch64_addhn2v8hi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1308): error: identifier “__builtin_aarch64_addhn2v4si” is undefined
return (uint16x8_t) __builtin_aarch64_addhn2v4si ((int16x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1317): error: identifier “__builtin_aarch64_addhn2v2di” is undefined
return (uint32x4_t) __builtin_aarch64_addhn2v2di ((int32x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1326): error: identifier “__builtin_aarch64_raddhn2v8hi” is undefined
return (int8x16_t) __builtin_aarch64_raddhn2v8hi (__a, __b, __c);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1333): error: identifier “__builtin_aarch64_raddhn2v4si” is undefined
return (int16x8_t) __builtin_aarch64_raddhn2v4si (__a, __b, __c);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1340): error: identifier “__builtin_aarch64_raddhn2v2di” is undefined
return (int32x4_t) __builtin_aarch64_raddhn2v2di (__a, __b, __c);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1347): error: identifier “__builtin_aarch64_raddhn2v8hi” is undefined
return (uint8x16_t) __builtin_aarch64_raddhn2v8hi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1356): error: identifier “__builtin_aarch64_raddhn2v4si” is undefined
return (uint16x8_t) __builtin_aarch64_raddhn2v4si ((int16x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1365): error: identifier “__builtin_aarch64_raddhn2v2di” is undefined
return (uint32x4_t) __builtin_aarch64_raddhn2v2di ((int32x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1458): error: identifier “__builtin_aarch64_pmulv8qi” is undefined
return (poly8x8_t) __builtin_aarch64_pmulv8qi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(1522): error: identifier “__builtin_aarch64_pmulv16qi” is undefined
return (poly8x16_t) __builtin_aarch64_pmulv16qi ((int8x16_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2230): error: identifier “__builtin_aarch64_ssublv8qi” is undefined
return (int16x8_t) __builtin_aarch64_ssublv8qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2237): error: identifier “__builtin_aarch64_ssublv4hi” is undefined
return (int32x4_t) __builtin_aarch64_ssublv4hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2244): error: identifier “__builtin_aarch64_ssublv2si” is undefined
return (int64x2_t) __builtin_aarch64_ssublv2si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2251): error: identifier “__builtin_aarch64_usublv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_usublv8qi ((int8x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2259): error: identifier “__builtin_aarch64_usublv4hi” is undefined
return (uint32x4_t) __builtin_aarch64_usublv4hi ((int16x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2267): error: identifier “__builtin_aarch64_usublv2si” is undefined
return (uint64x2_t) __builtin_aarch64_usublv2si ((int32x2_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2275): error: identifier “__builtin_aarch64_ssubl2v16qi” is undefined
return (int16x8_t) __builtin_aarch64_ssubl2v16qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2282): error: identifier “__builtin_aarch64_ssubl2v8hi” is undefined
return (int32x4_t) __builtin_aarch64_ssubl2v8hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2289): error: identifier “__builtin_aarch64_ssubl2v4si” is undefined
return (int64x2_t) __builtin_aarch64_ssubl2v4si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2296): error: identifier “__builtin_aarch64_usubl2v16qi” is undefined
return (uint16x8_t) __builtin_aarch64_usubl2v16qi ((int8x16_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2304): error: identifier “__builtin_aarch64_usubl2v8hi” is undefined
return (uint32x4_t) __builtin_aarch64_usubl2v8hi ((int16x8_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2312): error: identifier “__builtin_aarch64_usubl2v4si” is undefined
return (uint64x2_t) __builtin_aarch64_usubl2v4si ((int32x4_t) __a,
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2320): error: identifier “__builtin_aarch64_ssubwv8qi” is undefined
return (int16x8_t) __builtin_aarch64_ssubwv8qi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2327): error: identifier “__builtin_aarch64_ssubwv4hi” is undefined
return (int32x4_t) __builtin_aarch64_ssubwv4hi (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2334): error: identifier “__builtin_aarch64_ssubwv2si” is undefined
return (int64x2_t) __builtin_aarch64_ssubwv2si (__a, __b);
^
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h(2341): error: identifier “__builtin_aarch64_usubwv8qi” is undefined
return (uint16x8_t) __builtin_aarch64_usubwv8qi ((int16x8_t) __a,
^
Error limit reached.
100 errors detected in the compilation of “/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu”.
Compilation terminated.
[29/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/dmmv.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/dmmv.cu.o
[30/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/getrows.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
[31/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/im2col.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
[32/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/mmq.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
[33/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f32.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
[34/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn-tile-f16.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
[35/111] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_F16 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_USE_GRAPHS -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/. -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/… -I/opt/llama-cpp-python/vendor/llama.cpp/ggml/src/…/include -isystem /usr/local/cuda/targets/aarch64-linux/include -O3 -DNDEBUG -std=c++11 “–generate-code=arch=compute_87,code=[compute_87,sm_87]” -Xcompiler=-fPIC -MD -MT vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o -MF vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o.d -x cu -c /opt/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-cuda/fattn.cu -o vendor/llama.cpp/ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
ninja: build stopped: subcommand failed.
*** CMake build failed
error: subprocess-exited-with-error
× Building wheel for llama_cpp_python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /usr/bin/python3.10 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp32pi2w7s
cwd: /opt/llama-cpp-python
Building wheel for llama_cpp_python (pyproject.toml): finished with status ‘error’
ERROR: Failed building wheel for llama_cpp_python
Failed to build llama_cpp_python
ERROR: Failed to build one or more wheels
The command ‘/bin/sh -c /tmp/llama_cpp/install.sh || /tmp/llama_cpp/build.sh’ returned a non-zero code: 1
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/home/louie001/jetson-containers/jetson_containers/build.py”, line 112, in
build_container(args.name, args.packages, args.base, args.build_flags, args.build_args, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api, args.skip_packages)
File “/home/louie001/jetson-containers/jetson_containers/container.py”, line 147, in build_container
status = subprocess.run(cmd.replace(NEWLINE, ’ ‘), executable=’/bin/bash’, shell=True, check=True)
File “/usr/lib/python3.10/subprocess.py”, line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘DOCKER_BUILDKIT=0 docker build --network=host --tag text-generation-webui:r36.4.2-llama_cpp --file /home/louie001/jetson-containers/packages/llm/llama_cpp/Dockerfile --build-arg BASE_IMAGE=text-generation-webui:r36.4.2-exllama --build-arg LLAMA_CPP_VERSION=“0.3.2” --build-arg LLAMA_CPP_BRANCH=“0.3.2” --build-arg LLAMA_CPP_FLAGS=“-DGGML_CUDA=on -DGGML_CUDA_F16=on -DLLAMA_CURL=on” /home/louie001/jetson-containers/packages/llm/llama_cpp 2>&1 | tee /home/louie001/jetson-containers/logs/20250116_133527/build/text-generation-webui_r36.4.2-llama_cpp.txt; exit ${PIPESTATUS[0]}’ returned non-zero exit status 1.
louie001@localhost:~$