I’ve been trying to compile a benchmark code with CUDA and the Intel C/C++ compiler V9.0, and have not managed to get anything working yet. The produced binaries segfault immediately on startup before main() even gets entered. I’ve compiled with -shared, and with -i-static, and various other permutations to see if it has anything to do with the Intel libraries, but all appearances seem to indicate that this is not the case. ‘ldd’ gives me the following information:
My attempts were linked with icpc, and with icc, neither one worked:
johns@toledo{140} gmake clean; gmake
rm -f main.o util.o cpuenergy.o cuenergy
nvcc -O3 -Xcompiler "-m32" -I. -c main.cu
icpc -cxxlib-gcc -fno-exceptions -fomit-frame-pointer -fno-math-errno -no-prec-sqrt -pc32 -msse3 -vec-report=3 -c util.c
util.c(480) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.
util.c(551) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.
util.c(563) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.
util.c(601) : (col. 30) remark: loop was not vectorized: unsupported loop structure.
util.c(620) : (col. 18) remark: loop was not vectorized: unsupported loop structure.
util.c(470) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.
icpc -cxxlib-gcc -fno-exceptions -fomit-frame-pointer -fno-math-errno -no-prec-sqrt -pc32 -msse3 -vec-report=3 -c cpuenergy.c
cpuenergy.c(55) : (col. 3) remark: loop was not vectorized: not inner loop.
cpuenergy.c(60) : (col. 5) remark: loop was not vectorized: vectorization possible but seems inefficient.
cpuenergy.c(76) : (col. 5) remark: loop was not vectorized: not inner loop.
cpuenergy.c(95) : (col. 7) remark: LOOP WAS VECTORIZED.
icpc -cxxlib-gcc -fno-exceptions -fomit-frame-pointer -fno-math-errno -no-prec-sqrt -pc32 -msse3 -vec-report=3 main.o util.o cpuenergy.o -o cuenergy -cxxlib-gcc -shared -i-static -L/usr/local/encap/cuda-0.8//lib -lcuda -lcudart
I also disabled multi-file optimizations and various other things, but the code crashes immediately.
Here’s the output of strace:
johns@toledo{142} strace ./cuenergy
execve("./cuenergy", ["./cuenergy"], [/* 56 vars */]) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 3907 detached
If I recompile everything with -g, eliminate -fomit-frame-pointer, and most of the other optimizations, and run it in GDB, I get almost nothing, which tells me its dying in one of the shared library initialization routines, possibly the CUDA libs:
johns@toledo{147} gdb cuenergy
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) run
Starting program: /Projects/vmd/cuda/johns/cuenergybench/cuenergy
Program received signal SIGSEGV, Segmentation fault.
0x00000001 in ?? ()
(gdb) where
#0 0x00000001 in ?? ()