Transforming an array with a unary function

I want to call a certain 1-to-1 arithmetic operation on every element in an array and store the output to another array. Is there a generic way to do this where I would only need to write the unary function?

Alternatively, if I need to write my own kernel to do this, can the unary function somehow be passed as e.g. a template argument, or something else determined at compile time? To optimize the kernel I would likely do some loop unrolling etc so it would be nice to only have to write a generic kernel and simply plug-and-play different unary functions at compile time to get the same optimized behaviour.

one possible generic method:

thrust::transform

[url]GitHub - NVIDIA/thrust: The C++ parallel algorithms library.

Passing a function to a kernel:
There are multiple possible approaches to passing the function to be executed. It can be done with templating:

https://stackoverflow.com/questions/34879789/thrust-transform-throws-error-bulk-kernel-by-value-an-illegal-memory-access-w

but I think you could also say that this would be a canonical use for device lambda expressions:

[url]https://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/[/url]

[url]https://devblogs.nvidia.com/parallelforall/new-compiler-features-cuda-8/[/url]

I think I want to go the lambda route. Does passing a lambda as a template argument to a device function work the same as it does for regular C++11 lambdas? Does the lambda itself need to be declared with device?

Yes.

Yes.

Like this:

auto const f = [] __host__ __device__ (int const x) -> int
{
  return x * 2;
};