How to "wrap" function calls for running in CUDA ? Wrapping functions from ordinary C++...

shong9 · November 24, 2007, 2:54am

How to “wrap” function calls for running in CUDA ?

Could anyone help me how to “wrap” the function calls, so that those function calls will be executed in GPU?

I have C++ program already running ok, but I want to run some portions of the program in GPU… But as you already know, the functions that run as CUDA on GPU must be compiled by nvcc, and so on…

I want to compile only that portion by nvcc, and the rest as ordinary gcc…

I am using Linux… and I am a newbie. I would like to hear some explanations on the steps to compile… I really appreciate it…

Thank you

PS. Could you just provide simple code for wrapping it, and compile instructions ?

apaehler · November 24, 2007, 10:05am

I include a program here that does a kind of Fourier transformation

by direct summation. The code runs both on the CPU and the GPU:

the GPU (8600GTS) is about 160x faster than the CPU (Athlon X2 5600+).

Using Intel’s Vector Math Library the CPU code can be made to run about 2x

faster, but the CPU is still about 80x slower for sufficiently large problems.

This is a real world example from crystallography, runs at about 46 GFlops

on the GPU (counting 4 flops each for cos/exp/sin) and 2.8 GFlops on the CPU

(56 flops for cos/exp/sin). This is close to peak for 1 flop/cycle on both

the CPU and GPU. There are good reasons btw for not using FFTs for this.

GPU:

global void gpuSUMAniso2

(float *A, float *B,

 _F *H, _F *K, _F *L,

 _F x0, _F y0, _F z0, _F q0,

 _F b00, _F b01, _F b02, _F b03, _F b04, _F b05,

 _F x1, _F y1, _F z1, _F q1,

 _F b10, _F b11, _F b12, _F b13, _F b14, _F b15,

 _F *F0, _F *F1, _T size)

{

float U0, U1, f0, f1, g0, g1;

unsigned int i;

_T tid = blockDim.x * blockIdx.x + threadIdx.x;

_T tsz = blockDim.x * gridDim.x;

for (i = tid; i < size; i += tsz)

{

    U0 = H[i] * x0 + K[i] * y0 + L[i] * z0;

    U1 = H[i] * x1 + K[i] * y1 + L[i] * z1;

    f0 = b00 * H[i] * H[i] + b01 * K[i] * K[i] + b02 * L[i] * L[i];

    g0 = b03 * H[i] * K[i] + b04 * H[i] * L[i] + b05 * K[i] * L[i];

    f1 = b10 * H[i] * H[i] + b11 * K[i] * K[i] + b12 * L[i] * L[i];

    g1 = b13 * H[i] * K[i] + b14 * H[i] * L[i] + b15 * K[i] * L[i];

    f0 = F0[i] * q0 * expf(-(f0 + 2.f * g0));

    f1 = F1[i] * q1 * expf(-(f1 + 2.f * g1));

    A[i] += (f0 * cosf(U0) + f1 * cosf(U1));

    B[i] += (f0 * sinf(U0) + f1 * sinf(U1));

}

}

CPU:

void gpuSUMAniso2

(float *A, float *B,

 _F *H, _F *K, _F *L,

 _F x0, _F y0, _F z0, _F q0,

 _F b00, _F b01, _F b02, _F b03, _F b04, _F b05,

 _F x1, _F y1, _F z1, _F q1,

 _F b10, _F b11, _F b12, _F b13, _F b14, _F b15,

 _F *F0, _F *F1, _T N)

{

float U0, U1, f0, f1, g0, g1;

unsigned int i;

for (i = 0; i < N; i++)

{

    U0 = H[i] * x0 + K[i] * y0 + L[i] * z0;

    U1 = H[i] * x1 + K[i] * y1 + L[i] * z1;

    f0 = b00 * H[i] * H[i] + b01 * K[i] * K[i] + b02 * L[i] * L[i];

    g0 = b03 * H[i] * K[i] + b04 * H[i] * L[i] + b05 * K[i] * L[i];

    f1 = b10 * H[i] * H[i] + b11 * K[i] * K[i] + b12 * L[i] * L[i];

    g1 = b13 * H[i] * K[i] + b14 * H[i] * L[i] + b15 * K[i] * L[i];

    f0 = F0[i] * q0 * expf(-(f0 + 2.f * g0));

    f1 = F1[i] * q1 * expf(-(f1 + 2.f * g1));

    A[i] += (f0 * cosf(U0) + f1 * cosf(U1));

    B[i] += (f0 * sinf(U0) + f1 * sinf(U1));

}

}

As you can see the difference is minimal. _F btw is a typedef for

const float, _T for const unsigned int:

typedef const float _F;

typedef const unsigned int _T;

Topic		Replies	Views
How to "wrap" function calls for running in CUDA ? CUDA Programming and Performance	1	2475	November 24, 2007
Functions CUDA Programming and Performance	1	4674	August 30, 2007
CUDA Programming GPU Function Call problem CUDA Programming and Performance	1	3632	December 9, 2008
kernel calling a library function? newbie question CUDA Programming and Performance	3	3472	July 3, 2008
nvcc vs. gcc compilation CUDA Programming and Performance	3	5103	March 23, 2010
Which function should be put into which file?.c or .cu CUDA Programming and Performance	1	3644	September 15, 2011
Use of Npp lib with CUDA? CUDA Programming and Performance	2	1446	June 18, 2010
cuda compilation question CUDA Programming and Performance	3	4670	May 10, 2007
c++ Project file management using CUDA Where to put the CUDA code? CUDA Programming and Performance	3	3096	August 30, 2011
Calling cuda functions from c source code... CUDA Programming and Performance	1	4039	February 2, 2010

How to "wrap" function calls for running in CUDA ? Wrapping functions from ordinary C++...

Related topics