I want to call a certain 1-to-1 arithmetic operation on every element in an array and store the output to another array. Is there a generic way to do this where I would only need to write the unary function?
Alternatively, if I need to write my own kernel to do this, can the unary function somehow be passed as e.g. a template argument, or something else determined at compile time? To optimize the kernel I would likely do some loop unrolling etc so it would be nice to only have to write a generic kernel and simply plug-and-play different unary functions at compile time to get the same optimized behaviour.