Nvidia driver 525.147.05 cause ASAN crash

Overall

With nvidia driver 525.147.05, CUDA program will crash when call cuInit() on ASAN mode. And if I modify libcuda.so to use 510.108.03, the program work well.
So I think it’s related to nvidia driver change.

The behavior is weird, it’s not same with previous CUDA ASAN issue, it’s not ASAN address conflict with CUDA.

Reproduce step

// demo.cc
#include <cuda.h>
int main() {
  cuInit(0);
  return 0;
}

clang++ demo.cc -fsanitize=address -I/usr/local/cuda/include -lcuda -o demo -fuse-ld=gold

This problem only happen when we use both clang and gold ld.
I have observed similar issue on gcc, but I’m no longer able to reproduce it now.

Environment:
ubuntu 20.04
nvidia driver 525.147.05
GeForce RTX 4090

Reason analysis

Debug detail

ASAN use AsanInitInternal() to reserve shadow memory for sanitizer check, it may be called multiple times in program lifetime, therefore ASAN use a global variable asan_inited to track the initialization states and avoid duplicated initialization. So there should be only once valid invoking of InitializeShadowMemory(), which is executed before main().

But, linking with libcuda.so.525.147.05, the function InitializeShadowMemory is called twice. The first time is normal initialization. The second time is in cuInit(), it calls AsanInitInternal(). I print the value and address of global variable asan_inited, it’s not same with the first invoking. So the program call InitializeShadowMemory again and cause crash.

Conclusion

If you print the address of global variable asan_inited in ASAN lib, and link the grogram with libcuda.so.510.108.03, the address is unchanged all the time (of course it should be).
But if you link the program with libcuda.so.525.147.05, the variable address is changed when the program calls cuInit().

Other

The issue is also reported to ASAN community Shadow memory range interleaves with an existing memory mapping · Issue #1630 · google/sanitizers · GitHub