__forceinline__ and __forceinline

Suppose that I have a method for a, say, C++ matrix class, something like this

int GetRows() { return NumRows; };

As long as I know, in C++, the use of “inline” is a kind request to the compiler which could however ignore it. Opposite to that,

__forceinline int GetRows() { return NumRows; };

forces the inlining of the method.

Furthermore, CUDA has “forceinline” with the same meaning.

Now suppose that I would like to declare the above method as a “host device” method. What is the right procedure to force the inlining in either cases when the method is called from a host or from a device function? Whould I use

__host__ __device__ __forceinline __foreinline__ int GetRows() { return NumRows; };

Thank you very much for any answer.

The CUDA toolchain usually translates CUDA specific attributes into platform specific attributes for the host code as far as that is possible. For example, the align attribute is handled this way.

Taking a look at host_defines.h just now, it appears to be doing this for forceinline as well, which means you should just use forceinline in your CUDA code. You could inspect the generated host code by using -keep to retain intermediate compilation files to make sure this hypothesis holds.