NVCC forces c++ compilation of .cu files

I started programming Cuda a couple of months ago at my University. Our task was to optimize a video encoder and our precode was written in C. This caused a lot of trouble when including .cu files, since NVCC 2.x actually compiles these files as c++ files. c++ compilation enforces a different syntax for function symbols in object code, which resolved in linking errors of the type “undefined reference” when linking .cu object code with c compiled code (in the cases where the .c and .cu source files share headers). The situation can be remedied by putting all shared function declarations inside extern “C” {}.

While the latest NVCC documentation includes options for steering this aspect of compilation, e.g. the “–host-compilation” flag, these are deprecated with the latest NVCC version. In my current situation I am developing on an open source project written entirely in C, and while I am able to compile it successfully the c++ compiler is a lot stricter than the c compiler and generates errors in the native code.

Therefore - is it still possible to generate c compiled host code with the latest version of NVCC?

The CUDA runtime interface makes extensive use of C++ features, so it is not possible to compile code written for the runtime API (as opposed to the driver API) with a C compiler. Even the [font=“Courier New”]–host-compilation C[/font] compiler option didn’t change to the C compiler, it just adjusted a few settings so that the C++ compiler handled code more C-like. The option probably is deprecated for this reason: It doesn’t actually do what it pretends to do.

I’d suggest to move all your CUDA (host and device) code to a separate file. That way all your non-CUDA code can still be compiled with the C compiler, and [font=“Courier New”]extern “C”[/font] declarations in the CUDA part can be used to get the interface right (and well-defined).

I was facing the same problem adapting a C project to use CUDA, just like memstick reported. I’ve been looking for a proper solution and this suggestion by tera did the trick for me. I’ve just put [font=“Courier New”]extern “C”[/font] before the function definition in my [font=“Courier New”]cudaWrapper.cu[/font] file and the linking part among C and CUDA object files worked fine.

To be more specific, I did this in my C file:

// File: calc.c

extern int func_B( int x, int y);

int func_A()

{

   // code, code, then

   c = func_B( a, b );

// more code, then

   return result;

}

And this, in my CUDA file:

// File: cudaWrapper.cu

extern "C" int func_B( int x, int y )

{

   // C and CUDA code, then

   return result;

}

So, no more annoying linking messages about “undefined references” to [font=“Courier New”]func_B[/font] defined in the CUDA file.

Thank you very much. To both of you.

This a very timely thread because I just came upon a similar problem. I’ve inherited a lot of legacy ‘C’ code and I have to call some of the routines from my CUDA Kernel. The main problem that I run into is that the developers had implemented some simulated classes and inheritance and actually used the C++ reserved word ‘this’ as parameter names (C++ actually does this something like this internally). Anyway, nvcc when given a .cu file seems to be calling GCC as a C++ file for C++ compilation. This just happened today, because in the last few days, we had changed the makefiles to use nvcc for everything, but there were only .c files. Everything compiled, linked and ran fime. Now we are trying to add .cu to the SUFFIXES so that it will recognize those files and the exact same files now come up with complaints about ‘this’. I can restrict my CUDA Kernel code to just a few files, some pure CUDA, and some with shared host/device code and that also contain host only code.

Is there a way to force nvcc to invoke the host compiler (gcc) as a ‘C’ only compiler and not a C++ compiler?

Thanks,

Mike

Hi Mike,

I do not think it is possible to compile .cu files as a pure .c file with NVCC; I’ve been spending a lot of time trying to do it without success. In the end the solution was to wrap all project header includes within the .cu files in extern “C”{ }. As tera mentioned above, a trick if you’re stuck in a big C project is to wrap all your cuda code into a separate .cu file (no host code) compiling that (and only that file) with nvcc. The rest you can compile with gcc as normal. Just remember this extern “C” symbol around your program’s header files in the .cu file. Else, as I mentioned in my first post, the functions declared there will get compiled with c++ symbol semantics (not syntax, sorry). In the final stage the linker will then not be able to link the common C and C++ functions because of different symbol names.

Memstick, Thanks. I’ll try that and if it doesn’t work completely, then I guess I’ll go through to make sure that everything can be compiled with a C++ compiler.

Another solution, which I think would be a lot easier, is to simply set a couple of make rules so that .c files are compiled by gcc, and .cu files by nvcc. E.g.

.c.o:

  gcc $(CFLAGS) -c $>

.cu,o:

  nvcc $(NVFLAGS) -c $>

Then link everything with “nvcc -o $(OBJS)…”. Works for me, anyway.

Yeah, this is what I did. However the problem arises when you compile some kind of shared code in your .cu file with both nvcc and gcc (via your headers). In those cases you’re likely to get some linker issues with unresolved references.

Well, I’ve made all the code compile with nvcc/g++ and now I’ve finally started adding the Cuda Kernel code. ALL of the cuda kernel code will be shared with the host, so I added host device to all the shared functions in the .cu file and in the .h file. Now, when I compile I get something like this:

source=’…/…/…/src/mpeg/video.c’ object=‘video.o’ libtool=no \

    DEPDIR=.deps depmode=none /bin/sh ../../../depcomp \

    nvcc -DHAVE_CONFIG_H -I. -I../../../src/mpeg -I../..  -I../../../src/support     -O3 -g -arch=compute_20 -D__CUDA_ARCH__=200 -DMSI_IS_LINUX --compiler-options -Wno-format --compiler-options -Wimplicit-int --compiler-options -Wparentheses --compiler-options -Wreturn-type --compiler-options -Wsequence-point --compiler-options -Wswitch -DCUDA_PROCESSING --compiler-options -march=core2 --compiler-options -m64 --compiler-options -msse2 --compiler-options -mcx16 --compiler-options -msahf --compiler-options -pipe -D_DEBUG_ -D_DEBUG_PRINT_STDERR_NO_ -D_ARCH64_  -c ../../../src/mpeg/video.c

…/…/…/src/mpeg/video.h:695: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘device

…/…/…/src/mpeg/video.h:702: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘device

make: *** [msiVideoMp4.o] Error 1

One of the prototypes from the .h file looks like this:

host device status_e video_decodeIFrameMacroblocks(uint8 *buff, …);

So, it doesn’t like the host device directives in front of the prototypes.

There is NO way that I can separate out cuda code and put it in a separate .cu file as was suggested earlier in this thread. As I said, ALL functions which will be in the kernel will be shared with the host (except of course for the global kernelMain(…);

Any ideas what is wrong? Is this a Linux problem?

BTW, I can build and run the entire system when built with nvcc, so there are no ‘C’ only syntax or semantics hanging around. It just doesn’t like host device in the header files.

Thanks in advance,

Mike

When I’ve seen errors like this, it’s been an include file problem that somehow kept the compiler from seeing the CUDA include file that handles defining host &c. Do you have your .h file protected against being included multiple times with an #ifdef? Last time it happened to me, I’d somehow managed to use the same #ifdef NAME in two files…

All our header files are protected using the #ifndef method. Besides, wouldn’t I have seen the same problem before I started the modifications for Cuda? All I have to do is change two files from .c to .cu. The one files has my global void kernelMain() and the other file has been around for years and contains 2 existing functions which are called from my kernelMain. That and I add the host device in front of the function signatures. The system builds with no errors and run perfectly until I make these small changes.

I just wrote a real simple test from scratch, but in the same Linux/compiler environment. I have a cudaUtils.h and .cu. They contain one simple function that just adds two integer input params and returns the result. The cuda kernel just calls this function. The ‘add’ function has the device directive at the beginning of the line/signature.

Here’s the entire output from the compilation/linking:

nvcc -v -O3 -g -arch=compute_20 -D__CUDA_ARCH__=200 -DMSI_IS_LINUX --compiler-options -Wno-format --compiler-options -Wparentheses --compiler-options -Wreturn-type --compiler-options -Wsequence-point --compiler-options -Wswitch -DCUDA_PROCESSING --compiler-options -march=core2 --compiler-options -m64 --compiler-options -msse2 --compiler-options -mcx16 --compiler-options -msahf --compiler-options -pipe -D_DEBUG_ -D_DEBUG_PRINT_STDERR_NO_ -D_ARCH64_ -I…/…/…/src/support -I…/…/…/src/alerts -I…/…/…/tools/libTools -I…/…/…/src -I…/…/…/src/outputs -I…/…/…/src/inputs -I…/…/…/src/tables -I…/…/…/src/stats -I…/…/…/src/mpeg -DHAVE_CONFIG_H -I. -c …/…/…/tools/tests/cudaMain.cu
# _SPACE_= # CUDART=cudart
# _HERE_=/opt/cuda/bin # THERE=/opt/cuda/bin
# _TARGET_SIZE_=64 # TOP=/opt/cuda/bin/…
# LD_LIBRARY_PATH=/opt/cuda/bin/../lib:/opt/cuda/bin/../extools/lib: # PATH=/opt/cuda/bin/…/open64/bin:/opt/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.2:/opt/cuda/bin:/var/qmail/bin
# INCLUDES="-I/opt/cuda/bin/../include" "-I/opt/cuda/bin/../include/cudart" # LIBRARIES= “-L/opt/cuda/bin/…/lib64” -lcudart
# CUDAFE_FLAGS= # OPENCC_FLAGS=
# PTXAS_FLAGS= # gcc -D__CUDA_ARCH__=200 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -I"…/…/…/src/support" -I"…/…/…/src/alerts" -I"…/…/…/tools/libTools" -I"…/…/…/src" -I"…/…/…/src/outputs" -I"…/…/…/src/inputs" -I"…/…/…/src/tables" -I"…/…/…/src/stats" -I"…/…/…/src/mpeg" -I"." “-I/opt/cuda/bin/…/include” “-I/opt/cuda/bin/…/include/cudart” -D"CUDA_ARCH=200" -D"MSI_IS_LINUX" -D"CUDA_PROCESSING" -D"DEBUG" -D"DEBUG_PRINT_STDERR_NO" -D"ARCH64" -D"HAVE_CONFIG_H" -include “cuda_runtime.h” -m64 -g -o “/tmp/tmpxft_00000b63_00000000-4_cudaMain.cpp1.ii” “…/…/…/tools/tests/cudaMain.cu”
# cudafe --m64 --gnu_version=40302 -tused --no_remove_unneeded_entities --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.c" --stub_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.gpu" --include_file_name "/tmp/tmpxft_00000b63_00000000-3_cudaMain.fatbin.c" "/tmp/tmpxft_00000b63_00000000-4_cudaMain.cpp1.ii" # gcc -D__CUDA_ARCH__=200 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -I"…/…/…/src/support" -I"…/…/…/src/alerts" -I"…/…/…/tools/libTools" -I"…/…/…/src" -I"…/…/…/src/outputs" -I"…/…/…/src/inputs" -I"…/…/…/src/tables" -I"…/…/…/src/stats" -I"…/…/…/src/mpeg" -I"." “-I/opt/cuda/bin/…/include” “-I/opt/cuda/bin/…/include/cudart” -m64 -g -o “/tmp/tmpxft_00000b63_00000000-5_cudaMain.cpp2.i” “/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.gpu”
# cudafe --m64 --gnu_version=40302 --c --gen_c_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.c" --stub_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.gpu" --include_file_name "/tmp/tmpxft_00000b63_00000000-3_cudaMain.fatbin.c" "/tmp/tmpxft_00000b63_00000000-5_cudaMain.cpp2.i" # gcc -D__CUDA_ARCH__=200 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDABE__ -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -I"…/…/…/src/support" -I"…/…/…/src/alerts" -I"…/…/…/tools/libTools" -I"…/…/…/src" -I"…/…/…/src/outputs" -I"…/…/…/src/inputs" -I"…/…/…/src/tables" -I"…/…/…/src/stats" -I"…/…/…/src/mpeg" -I"." “-I/opt/cuda/bin/…/include” “-I/opt/cuda/bin/…/include/cudart” -m64 -g -o “/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i” “/tmp/tmpxft_00000b63_00000000-6_cudaMain.cudafe2.gpu”
# filehash -s " " "/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i" > "/tmp/tmpxft_00000b63_00000000-8_cudaMain.hash" # gcc -E -x c++ -D__CUDACC__ -C -Wno-format -Wparentheses -Wreturn-type -Wsequence-point -Wswitch -march=core2 -m64 -msse2 -mcx16 -msahf -pipe -O3 -I"…/…/…/src/support" -I"…/…/…/src/alerts" -I"…/…/…/tools/libTools" -I"…/…/…/src" -I"…/…/…/src/outputs" -I"…/…/…/src/inputs" -I"…/…/…/src/tables" -I"…/…/…/src/stats" -I"…/…/…/src/mpeg" -I"." “-I/opt/cuda/bin/…/include” “-I/opt/cuda/bin/…/include/cudart” -D"CUDA_ARCH=200" -D"MSI_IS_LINUX" -D"CUDA_PROCESSING" -D"DEBUG" -D"DEBUG_PRINT_STDERR_NO" -D"ARCH64" -D"HAVE_CONFIG_H" -include “cuda_runtime.h” -m64 -g -o “/tmp/tmpxft_00000b63_00000000-9_cudaMain.cpp4.ii” “…/…/…/tools/tests/cudaMain.cu”
#$ cudafe++ --m64 --gnu_version=40302 --parse_templates --gen_c_file_name “/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.cpp” --stub_file_name “/tmp/tmpxft_00000b63_00000000-1_cudaMain.cudafe1.stub.c” “/tmp/tmpxft_00000b63_00000000-9_cudaMain.cpp4.ii”
nvopencc -TARG:compute_20 -m64 -OPT:ftz=0 -CG:ftz=0 -CG:prec_div=1 -CG:prec_sqrt=1 “/tmp/tmpxft_00000b63_00000000-10_cudaMain” “/tmp/tmpxft_00000b63_00000000-7_cudaMain.cpp3.i” -o “/tmp/tmpxft_00000b63_00000000-2_cudaMain.ptx”
…/…/…/tools/tests/cudaMain.cu(28): Error: External calls are not supported (found non-inlined call to _Z14cudaUtils_plusii)

–error 0x2 –

make: *** [cudaMain.o] Error 2

You can see that I specify -arch=compute_20, but it is acting like only inlining for device functions is supported. I’m using Cuda 4.0. If I do move the implementation to the header file and declare it as ‘inline’, it does compile and run correctly.

Can anybody tell me what I’m doing wrong?
Thanks,
Mike