libnvptxcompiler_static.a crashes with a segmentation fault in pthread_mutex_lock() when linked with LLVM’s LLD linker on x86_64. The crash occurs because a mutex pointer is NULL (address 0x10 indicates offset into a NULL struct). The same code works correctly when linked with GNU ld, and works on aarch64 with LLD.
All tested CUDA versions on x86_64 exhibit this issue:
- CUDA 12.6.85
- CUDA 12.8.93
- CUDA 13.0.88
- CUDA 13.1.115
All work correctly on aarch64:
- CUDA 13.1.115
We tested across 8 different environment configurations on x86_64 (all sm_86)
| OS | CUDA Source | CUDA Version | Clang | LLD | Result |
|----|-------------|--------------|-------|-----|--------|
| Fedora 42 | Fedora 42 repos | 13.1 | 20.1.8 | 20.1.8 | SEGFAULT |
| Fedora 42 | RHEL10 repos | 13.1 | 20.1.8 | 20.1.8 | SEGFAULT |
| Fedora 43 | Fedora 42 repos | 13.1 | 21.1.8 | 21.1.8 | SEGFAULT |
| Fedora 43 | RHEL10 repos | 13.1 | 21.1.8 | 21.1.8 | SEGFAULT |
| Fedora 43 | RHEL10 repos | 13.0 | 21.1.8 | 21.1.8 | SEGFAULT |
| Fedora 43 | RHEL9 repos | 12.8 | 21.1.8 | 21.1.8 | SEGFAULT |
| Fedora 43 | RHEL9 repos | 12.6 | 21.1.8 | 21.1.8 | SEGFAULT |
| Fedora Rawhide | RHEL10 repos | 13.1 | 21.1.8 | 21.1.8 | SEGFAULT |
When using GNU ld instead all configurations work
On aarch64 (Apple Silicon) things work as expected:
- Fedora 43 + CUDA 13.1 + Clang 21.1.8 + LLD 21.1.8
Environment
x86_64 (affected)
- CPU: Intel Tigerlake (tested in Docker on Linux 6.18.7)
- OS: Fedora 42/43/Rawhide (tested across multiple variants)
- Toolchain: Clang 20.1.8 - 21.1.8, LLD (same versions), GCC 15.2.1
- GNU ld: 2.45.1
aarch64 (not affected):
- CPU: Apple Silicon M-series
- OS: Fedora 43 (Linux 6.10.14-linuxkit)
- Same Clang/LLD versions
Reproduction:
```cpp
#include <cstring>
#include <cstdio>
#include <nvPTXCompiler.h>
int main() {
const char *ptx_code = R"(
.version 7.0
.target sm_86
.address_size 64
.visible .entry dummy_kernel() { ret; }
)";
nvPTXCompilerHandle compiler;
nvPTXCompilerCreate(&compiler, strlen(ptx_code), ptx_code);
const char* options[] = {"--gpu-name=sm_86"};
nvPTXCompilerCompile(compiler, 1, options); // CRASH HERE
nvPTXCompilerDestroy(&compiler);
return 0;
}
Crashes (LLD on x86_64):
clang++ -fuse-ld=lld -o test main.cpp \
-I/usr/local/cuda/include \\
/usr/local/cuda/lib64/libnvptxcompiler_static.a -ldl -lpthread
./test # Segmentation fault
Works (GNU ld on x86_64):
clang++ -o test main.cpp \
-I/usr/local/cuda/include \\
/usr/local/cuda/lib64/libnvptxcompiler_static.a -ldl -lpthread
./test # Success
Works (LLD on aarch64):
clang++ -fuse-ld=lld -o test main.cpp \
-I/usr/local/cuda/include \\
/usr/local/cuda/lib64/libnvptxcompiler_static.a -ldl -lpthread
./test # Success
Stack Trace (UBSan)
==697==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000010
==697==The signal is caused by a READ memory access.
==697==Hint: address points to the zero page.
#0 pthread_mutex_lock@@GLIBC_2.2.5 (/lib64/libc.so.6)
#1 libnvptxcompiler_static_10d97869a92ae1171b368b8b13994d673e6bb182
#2 \__cuda_CallJitEntryPoint
#3 nvPTXCompilerCompile
#4 main
Register rdi = 0x0000000000000000 (NULL mutex pointer)
Summary is that it works on aarch64 but fails on x86_64 with identical toolchain, fails only with LLD, works with GNU ld, is version and toolchain-independent.
This prevents use of `libnvptxcompiler_static.a` in any project using the LLVM toolchain with LLD on x86_64 and rust projects (Rust uses LLD by default)
The workaround is to force GNU ld:
clang++ -fuse-ld=ld … # Use GNU ld
# or simply omit -fuse-ld flag to use default linker