Does thrust::transform invoke dynamic parallelism?

I’m trying to use thrust::transform on the device to transform two arrays on the stack but I’m getting errors about pointers to local memory. Specifically,

/usr/local/cuda/include/thrust/system/cuda/detail/bulk/detail/cuda_launcher/triple_chevron_launcher.hpp(132) : Error: a pointer to local memory cannot be stored into the parameter buffer, obtained at/usr/local/cuda/include/thrust/system/cuda/detail/bulk/detail/cuda_launcher/triple_chevron_launcher.hpp(131)

Is thrust using dynamic parallelism for its device version of thrust::transform?

For reference, all I’m really doing is something like this:

__global__
void test_kernel(void)
{
  reg::array<float, 4> a{ 1.0f, 2.0f, 3.0f, 4.0f };
  reg::array<float, 4> b{ 0 };

  thrust::transform(thrust::device, a.begin(), a.end(), b.begin(), [](float const f) -> float
  {
    return f * f;
  });
}

reg::array is basically std::array with the proper host device decorations around the member functions so it’s callable from both contexts.

It may use CUDA dynamic parallelism (CDP) depending on which execution policy you have specified, as well as which compile options you have specified.

If you have specified all necessary compile options for a CDP environment, and you have specified a parallel-capable execution policy, such as thrust::device or thrust::cuda::par.on(stream), then thrust may optionally (i.e. at its discretion) dispatch the requested operation using CDP child kernels.

If you don’t want this behavior, you can disable it with the thrust::seq execution policy.

As the error message is pointing out, in a CDP setting, it is illegal to pass (i.e. attempt to use) parent kernel local memory addresses as kernel parameters for use in a child kernel.

Since you are using a (device) lambda expression, you should also be compiling your code with appropriate switches for that.

Omg, I was literally going to come back to this thread saying that I figured it out. Yeah, it was the wrong execution policy. This code works now:

// it should be transformable
  {
    reg::array<float, 4> a{ 1.0f, 2.0f, 3.0f, 4.0f };
    reg::array<float, 4> b{ 0 };
    
    thrust::transform(
      thrust::seq,
      a.begin(), a.end(),
      b.begin(),
      [](float const f) -> float
      {
        return f * f;
      });
    
    assert((b == reg::array<float, 4>{ 1.0f, 4.0f, 9.0f, 16.0f }));
  }

But what’s a “CDP” environment?

And as for the compilation settings, I’m currently using CMake. This is my current CMakeLists.txt file.

CMake and CUDA isn’t bad. It really does help simplify the build process.

It’s also cool to know that if I did want to go the dynamic parallelism route, Thrust actually has my back.

CDP = CUDA Dynamic Parallelism

A description of “necessary compile options for a CDP environment” is given in the documentation:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compiling-and-linking

You can also refer to any of the CUDA CDP sample projects for setup.

Oh, okay. This was the first time I’ve seen CDP being used lol.

Thanks for taking the time to look at my code and suggesting thrust::seq! I wish I had looked back here sooner but part of me is happy I was able to figure it out on my own for once.