BUG: libnvptxcompiler_static.a segfaults when linked with LLD on x86_64

bwl1289 · February 1, 2026, 7:58pm

libnvptxcompiler_static.a crashes with a segmentation fault in pthread_mutex_lock() when linked with LLVM’s LLD linker on x86_64. The crash occurs because a mutex pointer is NULL (address 0x10 indicates offset into a NULL struct). The same code works correctly when linked with GNU ld, and works on aarch64 with LLD.

All tested CUDA versions on x86_64 exhibit this issue:

- CUDA 12.6.85

- CUDA 12.8.93

- CUDA 13.0.88

- CUDA 13.1.115

All work correctly on aarch64:

- CUDA 13.1.115

We tested across 8 different environment configurations on x86_64 (all sm_86)

| OS | CUDA Source | CUDA Version | Clang | LLD | Result |

|----|-------------|--------------|-------|-----|--------|

| Fedora 42 | Fedora 42 repos | 13.1 | 20.1.8 | 20.1.8 | SEGFAULT |

| Fedora 42 | RHEL10 repos | 13.1 | 20.1.8 | 20.1.8 | SEGFAULT |

| Fedora 43 | Fedora 42 repos | 13.1 | 21.1.8 | 21.1.8 | SEGFAULT |

| Fedora 43 | RHEL10 repos | 13.1 | 21.1.8 | 21.1.8 | SEGFAULT |

| Fedora 43 | RHEL10 repos | 13.0 | 21.1.8 | 21.1.8 | SEGFAULT |

| Fedora 43 | RHEL9 repos | 12.8 | 21.1.8 | 21.1.8 | SEGFAULT |

| Fedora 43 | RHEL9 repos | 12.6 | 21.1.8 | 21.1.8 | SEGFAULT |

| Fedora Rawhide | RHEL10 repos | 13.1 | 21.1.8 | 21.1.8 | SEGFAULT |

When using GNU ld instead all configurations work

On aarch64 (Apple Silicon) things work as expected:

- Fedora 43 + CUDA 13.1 + Clang 21.1.8 + LLD 21.1.8

Environment

x86_64 (affected)

- CPU: Intel Tigerlake (tested in Docker on Linux 6.18.7)

- OS: Fedora 42/43/Rawhide (tested across multiple variants)

- Toolchain: Clang 20.1.8 - 21.1.8, LLD (same versions), GCC 15.2.1

- GNU ld: 2.45.1

aarch64 (not affected):

- CPU: Apple Silicon M-series

- OS: Fedora 43 (Linux 6.10.14-linuxkit)

- Same Clang/LLD versions

Reproduction:

```cpp

#include <cstring>

#include <cstdio>

#include <nvPTXCompiler.h>

int main() {

    const char *ptx_code = R"(

        .version 7.0

        .target sm_86

        .address_size 64

        .visible .entry dummy_kernel() { ret; }

    )";



    nvPTXCompilerHandle compiler;

    nvPTXCompilerCreate(&compiler, strlen(ptx_code), ptx_code);

    const char* options[] = {"--gpu-name=sm_86"};

    nvPTXCompilerCompile(compiler, 1, options);  // CRASH HERE

    nvPTXCompilerDestroy(&compiler);

    return 0;

}

Crashes (LLD on x86_64):

clang++ -fuse-ld=lld -o test main.cpp \
-I/usr/local/cuda/include \\
/usr/local/cuda/lib64/libnvptxcompiler_static.a -ldl -lpthread

./test # Segmentation fault

Works (GNU ld on x86_64):

clang++ -o test main.cpp \
-I/usr/local/cuda/include \\
/usr/local/cuda/lib64/libnvptxcompiler_static.a -ldl -lpthread

./test # Success

Works (LLD on aarch64):

clang++ -fuse-ld=lld -o test main.cpp \
-I/usr/local/cuda/include \\
/usr/local/cuda/lib64/libnvptxcompiler_static.a -ldl -lpthread

./test # Success

Stack Trace (UBSan)

==697==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000010
==697==The signal is caused by a READ memory access.
==697==Hint: address points to the zero page.
#0 pthread_mutex_lock@@GLIBC_2.2.5 (/lib64/libc.so.6)
#1 libnvptxcompiler_static_10d97869a92ae1171b368b8b13994d673e6bb182
#2 \__cuda_CallJitEntryPoint
#3 nvPTXCompilerCompile
#4 main
Register rdi = 0x0000000000000000 (NULL mutex pointer)

Summary is that it works on aarch64 but fails on x86_64 with identical toolchain, fails only with LLD, works with GNU ld, is version and toolchain-independent.

This prevents use of `libnvptxcompiler_static.a` in any project using the LLVM toolchain with LLD on x86_64 and rust projects (Rust uses LLD by default)

The workaround is to force GNU ld:

clang++ -fuse-ld=ld …  # Use GNU ld

# or simply omit -fuse-ld flag to use default linker

bwl1289 · February 2, 2026, 2:42pm

Looks like this is related to the old issue - some CUDA static libraries still use `.ctors` and `.dtors` instead of `.init_array` and `.fini_array`, respectively.

As a result, the resultant executable ends up having both types of sections, which makes `glibc` silently ignore `.ctors` and `.dtors`, so the underlying global mutexes are never initialized (it’s why we’re getting this weird error w/ an attempt to invoke `pthread_mutex_lock` on null pointer).

`ld` (`ld.bfd`) and `ld.gold` by default perform merging / renaming of `.ctors` and `.dtors` into these modern section names, but not the `ld.lld` as use of `.ctors` and `.dtors` was considered deprecated 25-35 years ago.

This seems to be fixed for the most binaries around CUDA 11 - e.g., `libcudart_static` isn’t affected by this anymore. aarch64 static libraries aren’t affected by this either.

Seems people first discovered and submitted this bug to LLVM in 2016 but because this issue is too specific to CUDA on x86_64, the associated patch was never merged.

As a workaround, one could simply patch associated sections in place: e.g.,

llvm-objcopy --rename-section .ctors=.init_array --rename-section .dtors=.fini_array /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvptxcompiler_static.a

References:

1. lld produces broken executable with CUDA · Issue #30572 · llvm/llvm-project · GitHub

2. .ctors sections in static libraries

Topic		Replies	Views
.ctors sections in static libraries CUDA Setup and Installation cuda , linux	1	330	March 15, 2024
[SOLVED] Segfault on RHEL 6.10 (compiled with CUDA 9.1 and static linkage, runs on Ubuntu 16.04) CUDA Programming and Performance	6	811	December 19, 2018
Programme crashed within CUDA library on Jetpack 5.0.1 running on XavierNX CUDA NVCC Compiler nvbugs	1	654	July 28, 2022
Segmentation fault in pthread_mutex_lock () Legacy PGI Compilers	6	10674	June 29, 2022
CUDA + Intel C/C++ 9.0 in the same binary? CUDA Programming and Performance	2	5524	April 1, 2007
deviceQuery with statically-linked cudart segfaults when run on target platform CUDA Setup and Installation	0	463	April 11, 2023
Linking error for cuda separate compilation mode and static linking of cudart CUDA Programming and Performance cuda	2	1754	October 24, 2020
__GL_THREADED_OPTIMIZATIONS=1 fails on CUDA 6.5 343.22 GTX670 Linux	0	702	October 17, 2014
load dynalic library use CUDA at run time cause segfalt CUDA Programming and Performance	6	13309	June 22, 2012
Segfault when linking cudart statically and using exceptions CUDA Setup and Installation	1	627	November 1, 2023

BUG: libnvptxcompiler_static.a segfaults when linked with LLD on x86_64

Related topics