Combined __host__ __device__ functions How to tell if it is device or host executing?

Hi,

I am writing code that will run partly on CPU partly on GPU, so I want to be able to write the code once for both. 99% of the code will be the same between CPU and GPU, but in the inner most code I want to differ. Is there some way to do this e.g using pre-processing ? So what I’m after is something like this:

[codebox]#ifdef DEVICE_CODE

arr[threadIdx.x] = arr[threadIdx.x + 1];

#else

for(int i = 0; i < testSize; i++)

{

    arr[i] = arr[i + 1];

}

#endif[/codebox]

Is there a way to do this? Since it is only a very small amount of code at the centre of the algorithm which will be different, I don’t want to have to write everything twice.

Thanks

Raffles

It has been a regularly requested feature, but as of CUDA 2.3, I am pretty sure the answer is still that there is no nvcc generated symbol that would let you do what you want. There are ways around the problem, but they are not that pretty.

You can do it with CUDA 3.0

__host__ __device__ void  foo(int * arr, int testSize)

{

#ifdef  __CUDA_ARCH__

   arr[threadIdx.x] = arr[threadIdx.x + 1];

#else

   for(int i = 0; i < testSize; i++)

	{

		arr[i] = arr[i + 1];

	}

#endif

}

Excellent news! Roll on the 3.0 release…

Ok I see it in 3.0…damn though I’m using 2.3 for uni project.

Stopgap:

  1. Put host main prog and kernel into separate files eg. host.cu and kernel.cu.

  2. Create a kernel.h with the prototype of the kernel in it and include"kernel.h" in host.cu - do not include “kernel.cu” - this is important

  3. move all your host/device functions out into header files

  4. include those header files in both the host.cu and kernel.cu files

  5. At the TOP of the kernel.cu file define CUDA_ARCH (i.e. before any of the includes)

If you do this, all the files #included from kernel.cu will have CUDA_ARCH defined. If you move to 3.0 in the future you can just remove the define from the top of kernel.cu.

This only works if you have all your host/device functions defined in header files you include at the top of kernel.cu and host.cu. Hope this makes sense.

Cheers

Raffles

Just found out you CAN do it in 2.3: use CUDACC instead. See example in section 3.3.5 of the programming manual. For some reason it isn’t mentioned anywhere else in this or the reference manual except this one place!

I haven’t tested it out much, but seems to work on the simple examples I’ve tried.

Cheers

Isn’t CUDACC just checking if the compiler is nvcc or not?

Sure, but it means you can have the same header file compiled with both you regular C/C++ compiler and nvcc and get different code included in each. So you can have eg.

    [*]function “outer” called by either the kernel or the C++ main program

    [*]functions middle1, middle2 etc. called by “outer”

    [*]function “inner”, called by one or more of the other functions, whose definition depends on whether CUDACC is defined

The CUDACC version of inner can reference CUDA specific stuff such as threadIdx, the other version will just use vanilla C code. You need a couple of extra bits to make it compile (e.g. you need to define host and device), but it basically works. The downside is that now you are forced to use your regular C/C++ compiler alongside nvcc, instead of just having everything in CUDA. Hope that makes sense - I haven’t tried it out in practice yet!

What you cannot have with CUDACC is to have different host and device behaviour of a function in the same scope.
This #ifdef CUDA_ARCH sounds like a bit strange construct as it has to survive initial preprocessing and must be processed by the parser which detects that we are in a device function or not. I am wondering how it is implemented.