Preprocessor macro for host code?

There exists a preprocessor macro DEVICE_EMULATION that allows us to put debugging statements into functions that run during emulation mode only.

Similarly, I’m finding it very useful for debugging purposes to define “gold standard” versions of my algorithms that execute on the CPU for results checking. Therefore I am defining most of my functions as device host…and I was wondering if there was a preprocessor macro that could allow certain debugging statements to be emitted only on the host version.

Just define your own. The is no magic in the DEVICE_EMULATION symbol, it is just passed to the compiler from the Makefile using the -D option.

That’s interesting…I had no idea that constants could be defined in this way!

However, I don’t see that it can apply to this issue…because it is not a compiler flag. When the nvcc compiler encounters the host and device tags, it must internally know to split and compile 2 different versions of the function. Thus the constant is not defined based on compilation flags but rather it is based on which version of the function is being compiled…

It has been like that since about 1973 when Ken Thompson invented the C pre-processor at the behest of Alan Snyder.

Presumably you have separate Makefile entries for building and linking the cpu and CUDA versions of your code. If so, then by only defining the symbol controlling the inclusion debugging statements to the cpu build, you will get the conditional compilation you want. Something like

__device__ __host__ void foo(void)

{

#ifdef __HOST_DEBUG__

fprintf(stderr, "this is a host debug statement\n");

#endif

}

then to compile for the CUDA version

nvcc -c -o foo.cu.o foo.cu

and for the gold version (which I guess gets passed through to the host compiler anyway)

nvcc -D__HOST_DEBUG__ -c  -o foo.gold.o foo.cu

Won’t that work?

No, it’s not two separate compilations…the host device qualifiers just cause 2 different versions of the same function to be defined…for example in the following code the “foo” function is used to define a device version and a host version, and both “Compute” and “Compute_Gold” are simultaneously compiled.

__host__ __device__ float foo(float a, float b)

{

	return a + b*10.0f;

}

__global__ void kernel_func()

{

	foo(1.0f, 2.0f);

}

extern "C" void Compute()

{

   kernel_func<<<1,1>>>();

}

extern "C" void Compute_Gold()

{

   foo(1.0f, 2.0f);

}