deviceQuery with statically-linked cudart segfaults when run on target platform


I am getting a segfault when running deviceQuery on our target linux system. deviceQuery
was created with our cross-compiler and statically-linked with the libcudart_static library.

If I add –cudart shared to the NVCCFLAGS in the deviceQuery Makefile (aka no static link),
then running on our target platform works as expected. It does require the to
be present on the target.

Graphics card: RTX A4000 and RTX A2000
CUDA driver: 510.60.02
CUDA toolkit: 11.6.2

I am building deviceQuery with:
TARGET_ARCH=x86_64 SMS=“86” HOST_COMPILER=path to the toolchain/bin/g++ make

gdb backtrace:

(gdb) r
Starting program: /root/deviceQuery
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/".
/root/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

[New Thread 0x7fffdffff700 (LWP 65542)]

Thread 1 "deviceQuery" received signal SIGSEGV, Segmentation fault.
0x000000000043072f in __cudart792 ()
(gdb) bt
#0  0x000000000043072f in __cudart792 ()
#1  0x000000000041954f in __cudart523 ()
#2  0x0000000000422394 in __cudart1330 ()
#3  0x00007ffff7f9f263 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103
#4  0x00000000004700c9 in __cudart1606 ()
#5  0x0000000000418fb7 in __cudart513 ()
#6  0x0000000000441961 in cudaGetDeviceCount ()
#7  0x0000000000403f67 in main ()

Any idea on how to resolve this? All of our other CUDA development follows this
same behavior with the statically-linked cudart.

Thanks for the help,