Transforming an array with a unary function

I want to call a certain 1-to-1 arithmetic operation on every element in an array and store the output to another array. Is there a generic way to do this where I would only need to write the unary function?

Alternatively, if I need to write my own kernel to do this, can the unary function somehow be passed as e.g. a template argument, or something else determined at compile time? To optimize the kernel I would likely do some loop unrolling etc so it would be nice to only have to write a generic kernel and simply plug-and-play different unary functions at compile time to get the same optimized behaviour.

one possible generic method:


[url]GitHub - NVIDIA/thrust: The C++ parallel algorithms library.

Passing a function to a kernel:
There are multiple possible approaches to passing the function to be executed. It can be done with templating:

but I think you could also say that this would be a canonical use for device lambda expressions:



I think I want to go the lambda route. Does passing a lambda as a template argument to a device function work the same as it does for regular C++11 lambdas? Does the lambda itself need to be declared with device?



Like this:

auto const f = [] __host__ __device__ (int const x) -> int
  return x * 2;