problem with the linker ?_? headache ~_~

Keither · July 28, 2009, 3:15pm

hi everyone

i have this program that does not do much, not taking into account that it does not want to compile. i have been figting with it for the past four or so hours and it just wont budge. perhaps someone here can tell a noob like what in the world am i doing wrong as i am sick and tired of it by now!!

[codebox]///////////////////////////////////////////////////////////////////////////////

// Name: fft_qpskDemod.cu

// Copyright: NUI Maynooth

// Author: Isa Iminov

// Date: 27/07/09

// Description:

///////////////////////////////////////////////////////////////////////////////

include <stdlib.h>

include <stdio.h>

include <string.h>

include <math.h>

include <cuda.h>

include <cuda_runtime.h>

include <cutil.h>

include <cutil_inline.h>

include <cufft.h>

//include “kernel_fft_qpskDemod.cu”

//include “structures.h”

///////////////////////////////////////////////////////////////////////////////

// Name: structures.h

// Copyright: NUI Maynooth

// Author: Isa Iminov

// Date: 17/07/09

// Description: Declaration of global types

///////////////////////////////////////////////////////////////////////////////

#ifndef STRUCTURES_H

define STRUCTURES_H

///////////////////////////////////////////////////////////////////////////////

// Global types

///////////////////////////////////////////////////////////////////////////////

typedef struct {

float real;

float imag;

} Complex;

endif

///////////////////////////////////////////////////////////////////////////////

// Name: kernel_fft_qpskDemod.cu

// Copyright: NUI Maynooth

// Author: Isa Iminov

// Date: 27/07/09

// Description:

///////////////////////////////////////////////////////////////////////////////

#ifndef KERNEL_QPSK2D_CU

define KERNEL_QPSK2D_CU

////////////////////////////////////////////////////////////////////////////////

// Global types

////////////////////////////////////////////////////////////////////////////////

//include “structures.h”

////////////////////////////////////////////////////////////////////////////////

// Kernel that copies real data to complex data

////////////////////////////////////////////////////////////////////////////////

global void real2complex( Complex *d_b, cufftComplex *c, int R, int C ) {

/////////////////////////////////////////////////////////////////////////////

// Compute index for RxC array/data structure

/////////////////////////////////////////////////////////////////////////////

int ix = blockIdx.x * blockDim.x + threadIdx.x; // row of threads (single channel)

//int iy = blockIdx.y * blockDim.y + threadIdx.y;

//int idx = ix + iy * N;

/////////////////////////////////////////////////////////////////////////////

// Perform operation

/////////////////////////////////////////////////////////////////////////////

/*if ( ix < R && iy < C ) {

  c[idx].x = d_b[idx].real;

  c[idx].y = d_b[idx].imag;

} // end if

*/

for ( int iy = 0; iy < R; iy++ ) {

  c[ iy*C + ix ].x = d_b[ iy*C + ix ].real;

  c[ iy*C + ix ].y = d_b[ iy*C + ix ].imag;

} // end for

} // end real2complex

////////////////////////////////////////////////////////////////////////////////

// Kernel that copies complex data to real data

////////////////////////////////////////////////////////////////////////////////

global void complex2real( cufftComplex *c, Complex *d_b, int R, int C ) {

/////////////////////////////////////////////////////////////////////////////

// Compute index for RxC array/data structure

/////////////////////////////////////////////////////////////////////////////

int ix = blockIdx.x * blockDim.x + threadIdx.x; // row of threads (single channel)

//int iy = blockIdx.y * blockDim.y + threadIdx.y;

//int idx = ix + iy * N;

/////////////////////////////////////////////////////////////////////////////

// Perform operation

/////////////////////////////////////////////////////////////////////////////

/*if ( idx < N && idy < N ) {

  d_b[idx].real = c[idx].x;

  d_b[idx].imag = c[idx].y;

} // end if

*/

for ( int iy = 0; iy < R; iy++ ) {

  d_b[ iy*C + ix ].real = c[ iy*C + ix ].x;

  d_b[ iy*C + ix ].imag = c[ iy*C + ix ].y;

} // end for

} // end complex2real

////////////////////////////////////////////////////////////////////////////////

// Kernel that performs QPSK 2D mapping in global memory

////////////////////////////////////////////////////////////////////////////////

global void qpskModul( float *d_a, Complex *Lookup_d, Complex *d_b, int R, int C ) {

/////////////////////////////////////////////////////////////////////////////

// Compute index for RxC array/data structure

/////////////////////////////////////////////////////////////////////////////

int ix = blockIdx.x * blockDim.x + threadIdx.x; // row of threads (single channel)

//int iy = blockIdx.y * blockDim.y + threadIdx.y;

//int idx = ix + iy * N;

/////////////////////////////////////////////////////////////////////////////

// Perform mapping

/////////////////////////////////////////////////////////////////////////////

/*if ( ix < N && iy < N ) {

  d_b[ idx ].real = Lookup_d[ ( int )d_a[ idx ] ].real;

  d_b[ idx ].imag = Lookup_d[ ( int )d_a[ idx ] ].imag;

} // end if

*/

for ( int iy = 0; iy < R; iy++ ) {

  d_b[ iy*C + ix ].real = Lookup_d[ ( int )d_a[ iy*C + ix ] ].real;

  d_b[ iy*C + ix ].imag = Lookup_d[ ( int )d_a[ iy*C + ix ] ].imag;

} // end for

} // end qpskModul

////////////////////////////////////////////////////////////////////////////////

// Kernel that performs QPSK 2D mapping in global memory

////////////////////////////////////////////////////////////////////////////////

global void qpskDemod( float *d_a, Complex *Lookup_d, Complex *d_b, int R, int C ) {

/////////////////////////////////////////////////////////////////////////////

// Compute index for RxC array/data structure

/////////////////////////////////////////////////////////////////////////////

int ix = blockIdx.x * blockDim.x + threadIdx.x; // row of threads (single channel)

//int iy = blockIdx.y * blockDim.y + threadIdx.y;

//int idx = ix + iy * N;

/////////////////////////////////////////////////////////////////////////////

// Perform demapping

/////////////////////////////////////////////////////////////////////////////

/*if ( ix < N && iy < N ) {

  if ( d_b[ idx ].real == -sqrt( 2.0 ) ) {

     if ( d_b[ idx ].imag == -sqrt( 2.0 ) ) d_a[ idx ] = 0;

     else d_a[ idx ] = 2;

     } // end if

  else {

     if ( d_b[ idx ].imag ==  sqrt( 2.0 ) ) d_a[ idx ] = 3;

     else d_a[ idx ] = 1;

     } // end else

  } // end if

*/

for ( int iy = 0; iy < R; iy++ ) {

  if ( d_b[ iy*C + ix ].real == -sqrt( 2.0 ) ) {

     if ( d_b[ iy*C + ix ].imag == -sqrt( 2.0 ) ) d_a[ iy*C + ix ] = 0;

     else d_a[ iy*C + ix ] = 2;

     } // end if

  else {

     if ( d_b[ iy*C + ix ].imag ==  sqrt( 2.0 ) ) d_a[ iy*C + ix ] = 3;

     else d_a[ iy*C + ix ] = 1;

     } // end else

} // end for

} // end qpskDemod

endif

///////////////////////////////////////////////////////////////////////////////

// Main program

///////////////////////////////////////////////////////////////////////////////

define DO_CPU

#undef DO_CPU

define DO_GPU

//#undef DO_GPU

int main( int argc, char **argv ) {

printf( “fft and QPSK demod program for a 2D matrix…\n” );

printf( “=============================================\n” );

////////////////////////////////////////////////////////////////////////////

// Create timers

////////////////////////////////////////////////////////////////////////////

printf( “Initialising timers…\n” );

unsigned int timer1, timer2, timer3, timer4 = 0;

cutCreateTimer( &timer1 ); // CPU timer

cutCreateTimer( &timer2 ); // GPU timer

cutCreateTimer( &timer3 ); // cudaMemcpy timer 1

cutCreateTimer( &timer4 ); // cudaMemcpy timer 2

////////////////////////////////////////////////////////////////////////////

// Initialise memory

////////////////////////////////////////////////////////////////////////////

printf( “Initialising memory…\n” );

const int R = 256; // make a big array with RxC elements

const int C = 128; // where R = row (channel), C = col (channel elements)

float *a_h; // host data

Complex *b_h, *Lookup; // host data structures

a_h = ( float * )malloc( sizeof( float )RC ); // allocate array on host

b_h = ( Complex * )malloc( sizeof( Complex )RC ); // allocate structure on host

Lookup = ( Complex * )malloc( sizeof( Complex )*4 ); // allocate structure on host

float *a_d; // device data

Complex *b_d, *Lookup_d; // device data structures

cudaMallocHost( ( void ** ) &a_d, sizeof( float )RC ); // allocate array on device

cudaMallocHost( ( void ** ) &b_d, sizeof( Complex )RC ); // allocate structure on device

cudaMallocHost( ( void ** ) &Lookup_d, sizeof( Complex )*4 ); // allocate structure on device

cufftComplex *r_complex_d;

cudaMallocHost( (void **) &r_complex_d, sizeof(cufftComplex)RC);

////////////////////////////////////////////////////////////////////////////

// Initialize host array

////////////////////////////////////////////////////////////////////////////

printf( “Initialising data matrix…\n” );

unsigned int data = 0xf01919c6; // randome 4byte worth of data

unsigned int mask = 0x00000003; // bit mask

unsigned int temp = 0x00000000; // temporary storage

for ( int i = 0; i < R; i++ ) {

  for ( int j = 0; j < C; j++ ) {

     temp = data & mask;

     a_h[ i*C +  j ] = ( int )temp;

     data = data >> 2;

  } // end for

} // end for

////////////////////////////////////////////////////////////////////////////

// Initialise lookup table

////////////////////////////////////////////////////////////////////////////

printf( “Initialising lookup table…\n” );

Lookup[ 0 ].real = -sqrt( 2.0 ); Lookup[ 0 ].imag = -sqrt( 2.0 );

Lookup[ 1 ].real = sqrt( 2.0 ); Lookup[ 1 ].imag = -sqrt( 2.0 );

Lookup[ 3 ].real = sqrt( 2.0 ); Lookup[ 3 ].imag = sqrt( 2.0 );

Lookup[ 2 ].real = -sqrt( 2.0 ); Lookup[ 2 ].imag = sqrt( 2.0 );

printf( "\nPerforming cuda memory copy operation… " );

cutStartTimer( timer3 ); // start cudaMemcpy timer 3

cudaMemcpy( Lookup_d, Lookup, sizeof( Complex )*4, cudaMemcpyHostToDevice );

cutStopTimer( timer3 ); // stop cudaMemcpy timer 3

printf( " %f ms\n", cutGetTimerValue( timer3 ) );

cutResetTimer( timer3 );

ifdef DO_CPU

////////////////////////////////////////////////////////////////////////////

// Do calculation on host

////////////////////////////////////////////////////////////////////////////

printf( “Performing QPSK 2D operation on CPU…\n” );

cutStartTimer( timer1 ); // start CPU timer

for ( int i = 0; i < R; i++ ) {

  for ( int j = 0; j < C; j++ ) {

     b_h[ i*C + j ] = Lookup[ ( int )a_h[ i*C + j ] ];

  } // end for

} // end for

cutStopTimer( timer1 ); // stop CPU timer

endif

ifdef DO_GPU

////////////////////////////////////////////////////////////////////////////

// Do calculation on device

////////////////////////////////////////////////////////////////////////////

cutStartTimer( timer2 ); // start GPU timer

printf( "Performing cuda memory copy operation… " );

cutStartTimer( timer3 ); // start cudaMemcpy timer 1

cudaMemcpy( a_d, a_h, sizeof( float )RC, cudaMemcpyHostToDevice );

cutStopTimer( timer3 ); // stop cudaMemcpy timer 1

printf( " %f ms\n", cutGetTimerValue( timer3 ) );

printf( “Initialising plan for CUDA fft…\n” );

cufftHandle plan;

cufftPlan2d( &plan, R, C, CUFFT_C2C);

printf( “Computing execution configuration…\n” );

int block_size_x = 16; // # of threads per block = x*y

int block_size_y = 32; // upto max of 512

dim3 blockDim( block_size_x, block_size_y ); // block size in threads

dim3 gridDim ( R/blockDim.x, C/blockDim.y ); // grid size in bloack

printf( “Performing configuration check…\n” );

if ( R % block_size_x != 0 ) gridDim.x += 1;

if ( C % block_size_y != 0 ) gridDim.y += 1;

qpskModul <<< gridDim, blockDim >>> ( a_d, Lookup_d, b_d, R, C );

real2complex <<< gridDim, blockDim >>> ( b_d, r_complex_d, R, C );

cufftExecC2C( plan, r_complex_d, r_complex_d, CUFFT_INVERSE );

cufftExecC2C( plan, r_complex_d, r_complex_d, CUFFT_FORWARD );

complex2real <<< gridDim, blockDim >>> ( r_complex_d, b_d, R, C );

qpskDemod <<< gridDim, blockDim >>> ( a_d, Lookup_d, b_d, R, C );

////////////////////////////////////////////////////////////////////////////

// Retrieve result from device and store it in host array

////////////////////////////////////////////////////////////////////////////

printf( "Performing cuda memory copy operation… " );

cutStartTimer( timer4 ); // start cudaMemcpy timer 2

cudaMemcpy( a_h, a_d, sizeof( float )RC, cudaMemcpyDeviceToHost );

cutStopTimer( timer4 ); // stop cudaMemcpy timer 2

printf( " %f ms\n", cutGetTimerValue( timer4 ) );

cutStopTimer( timer2 ); // stop GPU timer

endif

////////////////////////////////////////////////////////////////////////////

// Print results

////////////////////////////////////////////////////////////////////////////

// printf( “\n x. y data value: real part: imag part:\n” );

// for ( int i = 0; i < R; i++ ) {

// for ( int j = 0; j < C; j++ ) {

// printf( “%3d.%3d %11f %12f %12f\n”, i, j, a_h[ iC + j ], b_h[ iC + j ].real, b_h[ i*C + j ].imag );

// } //end for

// } // end for

printf( “\nCPU calculation time: %f ms\n”, cutGetTimerValue( timer1 ) );

printf( “GPU operation time: %f ms\n”, cutGetTimerValue( timer2 ) );

printf( “GPU calculation time: %f ms\n”, cutGetTimerValue( timer2 )-cutGetTimerValue( timer3 )-cutGetTimerValue( timer4 ) );

printf( “\nLookup Table:\n” );

for ( int i = 0; i < 4; i++ ) printf( “%2d real:%10f imag:%10f\n”, i, Lookup[ i ].real, Lookup[ i ].imag );

//printf( “\nGrid size: %2dx%2dx%2dx blocks (2D max)\nBlock size: %2dx%2dx%2dx threads (3D max)\n”, gridDim, blockDim );

////////////////////////////////////////////////////////////////////////////

// Cleanup

////////////////////////////////////////////////////////////////////////////

printf( “\nCleanup memory…\n” );

free( a_h ); cudaFreeHost( a_d ); free( Lookup );

free( b_h ); cudaFreeHost( b_d ); cudaFreeHost( Lookup_d );

            cudaFreeHost( r_complex_d );

cufftDestroy( plan );

////////////////////////////////////////////////////////////////////////////

// Exit

////////////////////////////////////////////////////////////////////////////

printf( “Shutting down…” );

cutilExit( argc, argv );

} // end main

[/codebox]

Output Window

Performing Custom Build Step

template.cu

tmpxft_00000ac8_00000000-3_template.cudafe1.gpu

tmpxft_00000ac8_00000000-8_template.cudafe2.gpu

tmpxft_00000ac8_00000000-3_template.cudafe1.cpp

tmpxft_00000ac8_00000000-13_template.ii

template.cu(243) : warning C4244: ‘=’ : conversion from ‘int’ to ‘float’, possible loss of data

template.cu(252) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(252) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(253) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(253) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(254) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(254) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(255) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

template.cu(255) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data

Linking…

template.obj : error LNK2019: unresolved external symbol _cufftDestroy@4 referenced in function _main

template.obj : error LNK2019: unresolved external symbol _cufftExecC2C@16 referenced in function _main

template.obj : error LNK2019: unresolved external symbol _cufftPlan2d@16 referenced in function _main

…/…/bin/win32/Debug/template.exe : fatal error LNK1120: 3 unresolved externals

Results

Build log was saved at “file://c:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA CUDA SDK\projects\myproject\Debug\BuildLog.htm”

template - 4 error(s), 9 warning(s)

tmurray · July 28, 2009, 4:31pm

you need to link against cufft.

Keither · July 29, 2009, 10:34am

this might seem a bit odd to you but how do i do that? i am fairly noob about this programming business…

BeachHut · July 29, 2009, 12:55pm

Probably add something like -L (path to CUFFT)/lib -lcufft

How did you get those “template.cu(252) : warning C4244: ‘=’ : conversion from ‘double’ to ‘float’, possible loss of data” warnings? Is that a new feature of v2.3 of the toolkit? It just silently converts them for me…

Keither · July 29, 2009, 1:42pm

you see this tells nothing to a noob like me. i absolutly haven’t a clue how to do what you just described… to be honest with you i just started using VC++2005 seven weeks ago and the only thing i know how to do, is paste my code into template.cu and press build…

i dont know how i got the C4244 warnings? i just had them from the very start when i installed the VC++2005 (trial version). i am using cuda sdka nad toolkit v2.2 by the way.

by the way thx for your reply now its time to figure out what you meant by “-L (path to CUFFT)/lib -lcufft”

BeachHut · July 29, 2009, 1:57pm

I don’t use IDEs, so I don’t know how you change the compiler flags, but I’m sure it’s described in the documentation.

It will be compiling your code using something like:

nvcc -o .cu

You want to add what I posted on the end of that (you’ll need to find out where your CUFFT is installed first, though).

Perhaps you should start with simpler codes first until you are more comfortable with what you are doing.

Keither · July 29, 2009, 2:22pm

i see thanks for the info… i think that sould help a bit

update:

with litle luck in figuring out what and where i need to include (i’ll figure it out sooner or later) i decided to paste my code into one of the projects that used cufft library and it worked… ok, the thing compiled alright but execution is not impressive at the moment as i get crap load of errors during execution every time something from cufft library gets called… time to do more study…

thanks for everyones help anyway

it is much apreciated

chrisso · March 11, 2010, 6:39am

You have to add cufft.lib … to the linker inputs (Additional Dependencies)

Topic		Replies	Views
cutil error problem with cudaMemcoy ?_? CUDA Programming and Performance	2	4033	October 12, 2009
win-64 cufft libray CUDA Programming and Performance	3	3477	October 6, 2011
Error using CUFFT and OPENGL CUDA Programming and Performance	1	6179	March 14, 2010
cuFFT and fftw CUDA Programming and Performance	10	4356	August 25, 2010
cufft question CUDA Programming and Performance	6	8779	March 9, 2009
problem compiling cuda code HELP CUDA Programming and Performance	5	3895	July 21, 2009
2D CUFFT wrong result GPU-Accelerated Libraries cufft	8	3220	November 7, 2023
Wrong results in CUFFT! CUDA Programming and Performance	4	5546	March 22, 2011
Errors using cufftDx GPU-Accelerated Libraries cufft	4	1052	July 15, 2021
Cufft 1D transform CUDA Programming and Performance	6	3114	January 8, 2015

problem with the linker ?_? headache ~_~

Related topics