CUDA + Intel C/C++ 9.0 in the same binary?

I’ve been trying to compile a benchmark code with CUDA and the Intel C/C++ compiler V9.0, and have not managed to get anything working yet. The produced binaries segfault immediately on startup before main() even gets entered. I’ve compiled with -shared, and with -i-static, and various other permutations to see if it has anything to do with the Intel libraries, but all appearances seem to indicate that this is not the case. ‘ldd’ gives me the following information:

% ldd cuenergy

        libcuda.so => /usr/local/lib/libcuda.so (0x00e78000)

        libcudart.so => /usr/local/lib/libcudart.so (0x00d42000)

        libm.so.6 => /lib/tls/libm.so.6 (0x00b9c000)

        libc.so.6 => /lib/tls/libc.so.6 (0x008f4000)

        libdl.so.2 => /lib/libdl.so.2 (0x00fe5000)

        libptxcomp.so => /usr/local/lib/libptxcomp.so (0x00528000)

        libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00af8000)

        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x003fa000)

        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x003b4000)

        /lib/ld-linux.so.2 (0x007b1000)

        libfatZip.so => /usr/local/lib/libfatZip.so (0x00d26000)

It’s not a big deal since I can compile the benchmark code in two versions if necessary, but it would make life easier… :-)

Cheers,

John

John,

it should work.

Which compilers flag did you use?

Have you tried to link with icpc?

Could you run strace and see if you can spot the problem?

Thanks

Massimiliano

My attempts were linked with icpc, and with icc, neither one worked:

johns@toledo{140} gmake clean; gmake

rm -f main.o util.o cpuenergy.o cuenergy

nvcc -O3 -Xcompiler "-m32" -I.  -c main.cu

icpc -cxxlib-gcc -fno-exceptions -fomit-frame-pointer -fno-math-errno -no-prec-sqrt -pc32 -msse3 -vec-report=3 -c util.c

util.c(480) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.

util.c(551) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.

util.c(563) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.

util.c(601) : (col. 30) remark: loop was not vectorized: unsupported loop structure.

util.c(620) : (col. 18) remark: loop was not vectorized: unsupported loop structure.

util.c(470) : (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient.

icpc -cxxlib-gcc -fno-exceptions -fomit-frame-pointer -fno-math-errno -no-prec-sqrt -pc32 -msse3 -vec-report=3 -c cpuenergy.c

cpuenergy.c(55) : (col. 3) remark: loop was not vectorized: not inner loop.

cpuenergy.c(60) : (col. 5) remark: loop was not vectorized: vectorization possible but seems inefficient.

cpuenergy.c(76) : (col. 5) remark: loop was not vectorized: not inner loop.

cpuenergy.c(95) : (col. 7) remark: LOOP WAS VECTORIZED.

icpc -cxxlib-gcc -fno-exceptions -fomit-frame-pointer -fno-math-errno -no-prec-sqrt -pc32 -msse3 -vec-report=3 main.o util.o cpuenergy.o -o cuenergy -cxxlib-gcc -shared -i-static -L/usr/local/encap/cuda-0.8//lib -lcuda -lcudart

I also disabled multi-file optimizations and various other things, but the code crashes immediately.

Here’s the output of strace:

johns@toledo{142} strace ./cuenergy

execve("./cuenergy", ["./cuenergy"], [/* 56 vars */]) = 0

--- SIGSEGV (Segmentation fault) @ 0 (0) ---

+++ killed by SIGSEGV +++

Process 3907 detached

If I recompile everything with -g, eliminate -fomit-frame-pointer, and most of the other optimizations, and run it in GDB, I get almost nothing, which tells me its dying in one of the shared library initialization routines, possibly the CUDA libs:

johns@toledo{147} gdb cuenergy

GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)

Copyright 2004 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run

Starting program: /Projects/vmd/cuda/johns/cuenergybench/cuenergy 

Program received signal SIGSEGV, Segmentation fault.

0x00000001 in ?? ()

(gdb) where

#0  0x00000001 in ?? ()

Suggestions are welcome…

John