decomposition of a cuda program in several files compiler does not find function os other files

System:

GeForce GTX 470
CUDA Toolkit 3.2 (January 2011)

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2010 NVIDIA Corporation
Built on Wed_Nov__3_16:16:57_PDT_2010
Cuda compilation tools, release 3.2, V0.2.1221

Problem:

The following program is intended to use cufft with gtx 470 and should be built from several files. It was reduced to a minimal form in two files reproducing the error. When the two files are merged into one file it compiles and works. But with two files the compiler does not find the called function:

nvcc -c -arch compute_20 x1.cu
nvcc -c x.c
nvcc x.o x1.o -lcufft
x.o: In function main': x.c:(.text+0x79): undefined reference to mpmul’
collect2: ld returned 1 exit status
frank@pc101100643Xb:~/svn/clib/trunk/c_lib/mathlib/complex/fft>

cat x.c
include <stdio.h>

void mpmul(unsigned char w, unsigned char u, unsigned char v, int n, int m);

int main (void)
{
unsigned char a = { 3, 1, 4, 1, 5, 9, };
unsigned char b = { 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, };
unsigned char c [1000];
int i;

mpmul (c-1, a-1, b-1, 6, 10);

for (i = 0; i < 16; i++) printf (" %d", c [i]);
return 0;

}

cat x1.cu
include <stdio.h>
include <math.h>
include <cuda.h>
include <cufft.h>

define bs 32
define gs 256

char * cufftErrStr (int i)
{

}

global void cuSet0 (cufftDoubleReal * a, int n)
{
int i;
for
( i = blockIdx.x * blockDim.x + threadIdx.x; // start in grid 0
i < n; // go to end of array
i += blockDim.x * gridDim.x // jump to next grid
)
{
a [i] = 0;
} }

void mpmul(unsigned char w, unsigned char u, unsigned char v, int n, int m)
{
int mn,nn;
char * U, * V;
cufftDoubleComplex * A;
cufftDoubleReal * a;

    mn = m>=n ? m : n;
nn = 1; while (nn < mn) nn <<= 1; nn <<= 1;

     cudaMalloc ((void**) &U, sizeof (char) * n);
     cudaMalloc ((void**) &V, sizeof (char) * m);

     cudaMalloc ((void**) &a, sizeof (cufftDoubleReal) * nn * 2);
     cudaMalloc ((void**) &A, sizeof (cufftDoubleComplex) * nn * 2);

     cudaMemcpy (U, u+1, n*sizeof(char), cudaMemcpyHostToDevice);
     cudaMemcpy (V, v+1, m*sizeof(char), cudaMemcpyHostToDevice);
    
     cuSet0 <<<bs, gs>>> (a, 2*nn);

}

Perhaps the following could be helpful for the analysis: When displaying the object with nm the name mpmul is not stored there directly:

nm x1.o
0000000000000000 W Z10cudaLaunchIcE9cudaErrorPT
0000000000000000 T _Z11cufftErrStri
00000000000012cb T _Z26__device_stub__Z6cuSet0PdiPdi
0000000000000067 T _Z5mpmulPhS_S_ii
0000000000001349 T _Z6cuSet0Pdi

00000000000011ef W umin

I would be very grateful if anybody could kindly help me with the problem. Thank you.

(Edit: removed the nonimportant parts of the question after having solved it)

.cu programs are compiled as C++ files, not as C files. To call a function in a .cu file from C, it needs to be declared as [font=“Courier New”]extern “C”[/font] in order to use C linking conventions and prevent it from name mangling.

I tried to implement this solution, but it does not work either.

New commands for compilation (compiling as C-code does not accept extern “C”):

nvcc -x c++ -c x.c

nvcc -c -arch compute_20 x1.cu

nvcc x.o x1.o

keeps resulting in

nvcc x.o x1.o

x.o: In function `main’:

x.c:(.text+0x79): undefined reference to `mpmul’

collect2: ld returned 1 exit status

Changed x.c:

cat x.c

include <stdio.h>

extern “C”

void mpmul(unsigned char w, unsigned char u, unsigned char v, int n, int m);

int main (void)

{

unsigned char a [] = { 3, 1, 4, 1, 5, 9, };

unsigned char b [] = { 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, };

unsigned char c [1000];

int i;

mpmul (c-1, a-1, b-1, 6, 10);

for (i = 0; i < 16; i++) printf (" %d", c [i]);

return 0;

}

You need brackets around the declaration of mpmul:

extern “C” {
mpmul(…)
}

Mox

No, that’s not it. The braces are optional when there is only one declaration. But I solved the problem with your first hint: The file with the main must be a C++ file. Renaming it x.cc (or compiling with -x c++) does it.

That raises another question: How can one write a real C file with main in it and call from there the C++ functions with the device code?

(And excuse me please, that I postet all the files. I discovered the attach button of the forum only in this moment)

Just use extern “C” in the C++ subroutine file and compile with the C++, compiler the C main with the C compiler and then link with the C++ compiler, including the cuda libraries you need.

In the C++ subroutine file

extern "C" wrapper( )

{

  // C++ code goes here

}

in the C main

int main(void)

{

wrapper();

}

Thank you everbody. With your advices I could solve the problem. In the cuda files the functions called from outside have to be marked with
extern “C”
void doSomething (void) { doWhatSoEver(); }