What is the fundamental difference between __host__ __device__ and __device__ __host__?

I just want to clarify a point or two between the order they are called.

  • If you do __host__ __device__, you are declaring a routine on a host and that can be called by the device,
  • If you do __device__ __host__, you generate code on a device and call it from a host

Is that the basic of it?

No.

A routine decorated with host instructs the compiler to generate a host-callable entry point (i.e. compile it as host code). Such a routine is host code that can only be called from other host code.

A routine decorated with device instructs the compiler to generate a device-callable entry point (i.e. compile it as device code). Such a routine is device code that can only be called from other device code.

You can use both. If you use both, order does not matter. If you use both, the compiler generates both types of routines describe as above, one with a device-callable entry point, and one with a host-callable entry point.

The only situation where device code can be “called from host code” is the kernel launch, which must be decorated with global

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-declaration-specifiers

A routine with none of the above decorations is treated by nvcc implicitly as if it were decorated with host

I see - that clears thing up. Thank you!