Using a function both in cpp & device code

I would like to ask whether there might be a problem when using a function in both cpp & device code.

For example, assume that I have 3 files:

  1. Header file, foo.h that has the following code:

MYAPI inline float foo(float x) { return x*x; }

  1. A foo.cpp file, that includes the header file like this:

#define MYAPI
#include “foo.h”

void test(float *r) { for (int k=0; k<100; k++) r[k]=foo((float)k); }

  1. A kernel foo.cu file, that includes again the header file:

#define MYAPI device
#include “foo.h”

global void testkernel(float *r) { int idx=thread.Idx; if (idx<100) r[idx]=foo((float)idx); }

Assume that kernel run is made with appropriate thread/block size and r points
to a 100 entries properly allocated array for both host/device cases.

So, do you think that the above definitions are ok? I am asking this, due to a problem
I am experiencing with my cpp integration code which gives me some inconsistent behaviour.

thanks in advance

The functions you are invoking from kernel should be device type.
In your code you are trying to call host function from the kernel. This is not allowed.

I can assure you with the above code, there’s no complaint from the compiler. ;)

Hi,
Yes your code seems to be ok… I didn’t noticed the preprocessor directives making it host/device function.
I have attached a solution using your code and it is working fine. Check with the kernel invocation and device to host copy. May be you will be having some issues in these areas. Or else the code looks ok.
Sample1.zip (2.77 KB)

Thank you for taking the time making the sample.

Well, I have to say that in principle it works but in a large project there’s a point after which

I get “no results” from CUDA (with no running errors appearing whatsoever). I am trying

various ideas to find the root of the problem, mixing cpp/device code seems like a good

candidate for problems…

thanks!

Thank you for taking the time making the sample.

Well, I have to say that in principle it works but in a large project there’s a point after which

I get “no results” from CUDA (with no running errors appearing whatsoever). I am trying

various ideas to find the root of the problem, mixing cpp/device code seems like a good

candidate for problems…

thanks!