Hello everyone,
One very powerful metaprogramming technique in C++ is the base-to-derived cast, aka the Barton-Nackman trick (shown below). This idiom is the building block for techniques that make it possible to do compile-time expression simplification, eliminate temporaries, reorder arguments, and so on. Both Boost.uBLAS and Eigen rely on this idiom in the implementations of their expression hierarchies. However, it does not work with CUDA. In particular, code that uses this idiom within GPU kernels successfully compiles, but results in many frustrating errors where members of the derived class are set to zero, improperly initialized, and so on. Has anyone else success getting this idiom to work in CUDA applications? I am using the 64-bit CUDA Toolkit 4.1 on Linux.
If you need a more thorough test case, I can put up project that will compile and demonstrate the problems I am talking about.
template <class T, class SizeType, class Derived>
struct expression
{
typedef T value_type;
typedef SizeType size_type;
typedef Derived derived_type;
Derived& operator()()
{
return static_cast<Derived&>(*this);
}
const Derived& operator()() const
{
return static_cast<const Derived&>(*this);
}
};
// An expression derived from this one would supply its own
// type as a template parameter. Algorithms would then accept
// expressions as arguments, and invoke operator() to access
// the underlying type.