Mixing device and host code is not nice

I sometimes want to implement the same algorithm on the host and device, where things are mostly the same, but there may be some specific details that are different between the host and device. For example, an algorithm that reads interpolated texture values should use a texture read on the device, whereas host code will have to have its own interpolation implemented. The most obvious solution to me would be to have two different implementations of the same function where one will be used by the host and one by the device as shown in FAIL 1 listed below. Such a setup would potentially allow both the host and device specific codes to call functions that are shared between host and device. Unfortunately, this does not compile.

My next attempts are working around the problem also fail, where I use the preprocessor to select which code to run at compile time. Calling just device functions does not work from a host, device function.

// FAIL 1
__host__ float interpolate_voxel(float x, float y, float z)
{
	return interpolate_voxel_host(x,y,z);
}
__device__ float interpolate_voxel(float x, float y, float z)
{
	return tex3D(tex_voxels, x, y, z); // interpolate_voxel already defined
}

// FAIL 2
__device__ __host__ float interpolate_voxel(float x, float y, float z)
{
#ifdef RUN_ON_HOST
	return interpolate_voxel_host(x,y,z);
#else
	return tex3D(tex_voxels, x, y, z); // call device from device host
#endif
}


// FAIL 3
#ifdef RUN_ON_HOST
__device__ __host__ float interpolate_voxel(float x, float y, float z)
{
	return interpolate_voxel_host(x,y,z);
}
#else
__device__ __host__ float interpolate_voxel(float x, float y, float z)
{
	return tex3D(tex_voxels, x, y, z); // call device from device host
}
#endif

My final workaround, worked, but is extremely ugly. Instead of localizing the difference between host and device functions I had to define a macro DEVHOST that is used to specify the type of every function.

#ifdef RUN_ON_HOST
#define DEVHOST __host__
DEVHOST float interpolate_voxel(float x, float y, float z)
{
	return interpolate_voxel_host(x,y,z);
}
#else
#define DEVHOST __device__
DEVHOST float interpolate_voxel(float x, float y, float z)
{
	return tex3D(tex_voxels, x, y, z);
}
#endif

This just a suggestion, but if code like in FAIL 1 was supported by the language, that would make device/host interoperability much nicer. Especially if you want to use both the host and device without recompiling, or at the same time. The macro solution I put at the end means you can only use the device or host, but not both.

Thanks for hearing me out,
Josiah

Try this:

__host__ __device__ float interpolate_voxel(float x, float y, float z)
{
#ifdef __CUDA_ARCH__
  return tex3D(tex_voxels, x, y, z);
#else
  return interpolate_voxel_host(x,y,z);
#endif
}

Hi Jared,

Thanks, the CUDA_ARCH macro does what I wanted.

Josiah