[SOLVED] Segfault on RHEL 6.10 (compiled with CUDA 9.1 and static linkage, runs on Ubuntu 16.04)

I wrote a program on Ubuntu 16.04, CUDA 9.1 and driver R390.30 with static likage (-Xcompiler -static). It runs correctly and provides the expected results.
Compiling on a MBP late 2012 (GT650M) with Xcode 9.4 and CUDA 10 also goes without any trouble and runs just fine.

However, I need to run this on a machine at work (where I have no admin access or the right to install anything), so I have to use the statically linked exe from Ubuntu. The C++ part runs correctly but it segfaults at the kernel function.

The machine has RHEL 6.10 with a GRID P40-8Q and driver 390.75. Unfortunately I can’t compile here.
These are the last lines of strace on the process just before it segfaults:

...
open("/global/lib//libnvidia-fatbinaryloader.so.390.75", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/global/lib/libnvidia-fatbinaryloader.so.390.75", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib64/libnvidia-fatbinaryloader.so.390.75", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000Y\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=291496, ...}) = 0
mmap(NULL, 2407944, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1d720fc000
mprotect(0x7f1d72139000, 2093056, PROT_NONE) = 0
mmap(0x7f1d72338000, 45056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3c000) = 0x7f1d72338000
mmap(0x7f1d72343000, 19976, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f1d72343000
close(3)                                = 0
open("/global/distlib//ld-linux-x86-64.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/global/distlib/ld-linux-x86-64.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/global/lib//ld-linux-x86-64.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/global/lib/ld-linux-x86-64.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib64/ld-linux-x86-64.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\v@\3450\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=161776, ...}) = 0
mmap(0x30e5400000, 2236816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x30e5400000
mprotect(0x30e5420000, 2097152, PROT_NONE) = 0
mmap(0x30e5620000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x30e5620000
mmap(0x30e5622000, 400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x30e5622000
close(3)                                = 0
mprotect(0x30e5620000, 4096, PROT_READ) = 0
mprotect(0x30e5f8a000, 16384, PROT_READ) = 0
mprotect(0x30e5a02000, 4096, PROT_READ) = 0
mprotect(0x30e6217000, 4096, PROT_READ) = 0
mprotect(0x30e6e06000, 4096, PROT_READ) = 0
mprotect(0x30e6682000, 4096, PROT_READ) = 0
set_tid_address(0x202bb90)              = 11361
set_robust_list(0x202bba0, 24)          = 0
futex(0x7ffcda6549ac, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7ffcda6549ac, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 202b8c0) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigaction(SIGRTMIN, {0x30e6005cb0, [], SA_RESTORER|SA_SIGINFO, 0x30e600f7e0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x30e6005d40, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x30e600f7e0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM64_INFINITY}) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---
+++ killed by SIGSEGV (core dumped) +++

There are a few “No such file or directory” lines around there. There must be something missing in this executable. What do you guys suggest me to inspect/try?

=========== EDIT ===========

I’ve added the path of these missing libs to my LD_LIBRARY_PATH and tried again. While strace looks for the files there, it still says they can’t be found.
It fails around the launching of a kernel function, which is preceded by a few declarations of thrust::device_vector.

The sections that use thrust::host_vector are running correctly, but my next printf is just after the kernel finishes, so I don’t know if it crashed when allocating device_vector or running the kernel.

=========== EDIT 2 ============

I added /lib64 so it could find ld-linux-x86-64.so.2. All of these files are found now.
Maybe I need to get the ld-linux-x86-64.so.2 of the Glibc I used for compilation?

If you guys have done that successfully, compile and statically link CUDA on a distro, run on another, let me know if you have any advice.

Why not? If you can deposit files on that machine and run executables on that machine, it’s not obvious to me why you can’t compile. The main reason I can think of would be extremely limited file space in your user account (say, less than 3 GB).

If you must compile somewhere else, I think your best chance of success is to set up a machine that closely matches the target machine in terms of software configuration aspects (compiler toolchain, libraries, etc.)

You can spin up a CUDA development docker container on your Ubuntu machine, that could closely match the target, and compile that way.

I will check what version of GCC is there and get hold of a local CUDA installation tomorrow!
There is nothing in the code that requires the latest SDK release anyway.
Thanks for the simplest solution!

I managed to install CUDA 7.5 and it compiles stuff without issues. Well, most stuff.

My actual program uses C++ regular expressions functions to parse some text, and I have to compile with -std=c++0x. If I just call g++ with this flag, it will do without problems. But nvcc doesn’t like it and says:

nvcc fatal   : Value 'c++0x' is not defined for option 'std'

I was searching around and people seem to have experienced the same, and some of them fixed by exporting CXX = g++ (on Ubuntu/CentOS discussions). If I do it, then nvcc outputs a bunch of arcane errors.

Is it something with CUDA 7.5 nvcc? The gcc version of RHEL 6.x is 4.4.7.

======== EDIT =========

I get the same error from nvcc when using CUDA 9.1 (all 3 patches applied) on a RHEL 6.10 with GCC 4.4.7 (machine with R390.75 driver installed).

======== EDIT 2 =========

After reading here, https://devtalk.nvidia.com/default/topic/973179/cuda-8-support-for-c-14-windows-linux/, the case seems to be a clear split between host and device code. Right now I have a .cu file with all the host functions and global kernels, that I call from the main program .cu file.

So it would be the case to place the kernels in a .cu file, the host stuff in a .cpp, main program in a .cpp, then try again? Or it is just a compiler version problem?

You could try:

nvcc -Xcompiler -std=c++0x …

I tried, but then instead of getting just that error message, it starts complaining about stl_iterator and other errors from the beyond.

I will do the host and device code segregation as suggested in that thread and see if g++ handles the c++0x dependency by itself instead of nvcc. Damn this …

The RHEL 6.10 had the 6.3.1-3 toolchain installed, so I just had to make it default.
nvcc compiled everything without any modification. I just had to use CUDA 9.1 instead of 7.5 (which doesn’t like 6.3.1), thus limiting the use to machines with the driver 390. 352 ones are out, but that’s life.