Can we call a thrust transform of a functor inside another functor?

quang-ha · July 16, 2018, 6:46pm

I don’t think this could work (as I don’t think it would for with thrust::for_each either) - but just to strengthen the point. Is something along the line of:

struct Op1 : public thrust::unary_function<int, float>
{
  __host__ __device__
  float operator()(int iD)
  {
     // Some calculation
     return answer;
  }
}

struct Op2 : public thrust::unary_function<float, float>
{
  int *vec1Ptr_;
  int *vec2Ptr_;
  public:
    __host__ __device__
    Op2(int* vec1Ptr, int* vec2Ptr) : vec1Ptr_(vec1Ptr), vec2Ptr_(vec2Ptr) {};

    __host__ __device__
    float operator()(float input)
    {    
      thrust::transform(vec1Ptr.begin(), vec1Ptr.end(), vec2Ptr.begin(), Op1);
      // some calculation on vec2
      return answer2;
    }
}

allowed?

Robert_Crovella · July 16, 2018, 7:01pm

Generally, yes, you can embed a thrust algorithm in a functor. That thrust algorithm call could conceivably make use of another user-defined functor.

An example of a thrust functor making a thrust algorithm call is here:

https://stackoverflow.com/questions/28150098/how-to-use-thrust-to-sort-the-rows-of-a-matrix/28254765#28254765

in particular, study the construction and usage of the sort_functor there. (It doesn’t use a custom operator, though. Instead it uses a built-in operator. But that could be changed.)

what you have shown will not work possibly for several reasons. One of them is that this:

vec1Ptr.begin()

is invalid usage.

vec1Ptr is a ordinary pointer to int

In C++, when you do something like:

thing.begin()

you are attempting to invoke a class method, for that particular object. In ordinary thrust usage,

thing.begin()

would imply that thing is a member of some class that has a class method named begin

This is not a true statement when thing is a bare pointer.

It would be a true statement if thing were e.g. a thrust vector, eg.

thrust::device_vector thing;

The problem would then arise that objects of type thrust::device_vector can’t be directly used in CUDA device code, as already pointed out to you here:

https://stackoverflow.com/questions/51292290/storing-a-device-vector-inside-a-functor-through-the-constructor

So you would need to craft your usage of thrust algorithms embedded in functors to make sure to invoke the device path for the algorithm, and to only use bare device pointers as iterators. I refer you to the first example link.

To intercept the next question, yes, unlike thrust::device_vector, thrust::device_ptr is usable directly in CUDA device code. usage-wise, it is not much different than a bare pointer. It does not have begin() and end() methods.

another reason what you have shown will not work is that Op2 really has no knowledge of what Op1 is. But that is solvable with some recrafting of Op2. (see my next comment in this thread for an example)

quang-ha · July 16, 2018, 7:11pm

Thank you very much - once again! I deliberately used pointer in the rough example as I have learned a thing or two from that question by now (yay, progress!)

I am scared of how expected questions can be for a new learner…

Robert_Crovella · July 16, 2018, 7:33pm

Here’s a worked example, roughly following your outline:

$ cat t1390.cu
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <thrust/device_vector.h>

struct Op1 : public thrust::unary_function<int, int>
{
  __host__ __device__
  float operator()(int iD)
  {
     return iD+1;
  }
};

template <typename T>
struct Op2 : public thrust::unary_function<float, float>
{
  int *vec1Ptr_;
  int *vec2Ptr_;
  T my_op_;
  int n_;
  public:
    __host__ __device__
    Op2(int* vec1Ptr, int* vec2Ptr, int n, T op) : vec1Ptr_(vec1Ptr), vec2Ptr_(vec2Ptr), n_(n), my_op_(op) {};

    __host__ __device__
    float operator()(float input)
    {
      thrust::transform(thrust::seq, vec1Ptr_, vec1Ptr_+n_, vec2Ptr_, my_op_);
      return (float)(vec2Ptr_[0])+input;
    }
};

int main(){
  const int vlen = 3;
  thrust::device_vector<float>  in(1,0.5f);
  thrust::device_vector<float> out(1);
  thrust::device_vector<int> v1(vlen);
  thrust::device_vector<int> v2(vlen);
  thrust::transform(in.begin(), in.end(), out.begin(), Op2<Op1>(thrust::raw_pointer_cast(v1.data()), thrust::raw_pointer_cast(v2.data()), vlen, Op1()));
  thrust::copy(out.begin(), out.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl;
}
$ nvcc -o t1390 t1390.cu
$ ./t1390
1.5,
$

(I had to make various changes.)

The in vector starts out at 0.5. the host code transform operation calls Op2 on that, which in turn calls Op1 through another transform call. That transform takes v1 and adds 1 to each element. They start out at 0 so now each element is 1. The host code transform then takes this 1 value and adds 0.5 to it.

This is kind of silly code. One reason is that the thrust transform call contained in Op2 is called once for each element of the vector passed to Op2. Since each call is operating on the same v1 and v2 vectors, I can’t think of a sensible use case for this construct (it could probably make sense if we were working on the rows or columns of a matrix, in which case I refer you to the example linked in my previous comment). Therefore I’ve crafted my governing vectors to only have 1 element, so that the resultant call to transform in the Op2 functor only happens once, for sanity.

quang-ha · July 16, 2018, 7:45pm

Truly appreciate your detailed response and your effort!

One of the main reason I can see why I am committing to these ‘suicidal’ routes is because the original algorithm works pretty nicely with thrust using OpenMP/Intel TBB as back end. I am trying to extend it to thrust/CUDA to double check the potential speed up since our problem is embarrassingly parallel, without refractoring the code base a lot. Having poked different corners of thrust/CUDA for a while, I don’t think we can get away with such without massive anatomy on the back end (as you and others have kindly pointed out the obvious requirements).

quang-ha · July 17, 2018, 4:01pm

@txbob is there any reasons why the following codes won’t compile? The first case:

#include "thrust/device_vector.h"
#include "thrust/device_ptr.h"
#include "thrust/transform.h"

struct Functor2  : public thrust::unary_function<int, int> {
  __host__ __device__
    int operator()(int i) { return 2*i;}
};


class Functor1  : public thrust::unary_function<int, int> {
 public:
  __host__ __device__
  int operator()(int j) {

    // Do some dummy stuff for an excuse to call another functor

    int p[1000], q[1000];
    thrust::device_ptr<int> p_ptr(p), q_ptr(q);
    thrust::transform(p_ptr, p_ptr+1000, q_ptr, Functor2());

    return j+1;
  }
};


int main(int argc, char *argv[]) {
  Functor1 myfunctor1;
  
  thrust::device_vector<int> invec(100);
  thrust::device_vector<int> outvec(100);
 
  thrust::transform(invec.begin(), invec.end(), outvec.begin(),
            myfunctor1);
  return 1;
}

The second case:

#include "thrust/device_vector.h"
#include "thrust/device_ptr.h"
#include "thrust/transform.h"
#include "thrust/for_each.h"


struct Functor2  : public thrust::unary_function<int, int> {
  __host__ __device__
    int operator()(int i) { return 2*i;}
};


class Functor1  : public thrust::unary_function<int, int> {
 public:
  __host__ __device__
  int operator()(int i) {

    // Do some dummy stuff for an excuse to call another functor

    int p[1000], q[1000];
    thrust::device_ptr<int> p_ptr(p), q_ptr(q);

    thrust::transform(p_ptr, p_ptr+1000, q_ptr, Functor2());

    return i+1;
  }
};


int main(int argc, char *argv[]) {

  int ncells = 100;

  Functor1 myfunctor1;
  
  thrust::device_vector<int> invec(ncells);
  for (int i = 0; i < ncells; i++) invec[i] = i;
 
  thrust::device_vector<int> outvec(ncells);
 
  thrust::transform(invec.begin(), invec.end(), outvec.begin(),
            myfunctor1);

  return 1;
}

In either case, nvcc complains about calling host inside host device is not allowed. I don’t think I have anything calling host functions inside my device code, no?

Robert_Crovella · July 18, 2018, 12:45am

Use an execution policy and specify thrust::device or thrust::seq on your transform calls in the functor. Study the code I already gave you carefully for proper usage.

quang-ha · July 18, 2018, 4:56am

I see now - sorry I must have missed it!

Topic		Replies	Views
(Thrust) Defining functor inside a function thrust, transform, functor CUDA Programming and Performance	3	4583	February 15, 2012
Using thrust::transform_output_iterator in real code CUDA Programming and Performance	3	882	October 19, 2019
Pass Structure to Thrust Functor CUDA Programming and Performance	2	1578	November 12, 2018
Cuda Thrust Custom function CUDA Programming and Performance	6	5596	September 25, 2011
select functor from a set at run time CUDA Programming and Performance	3	628	January 9, 2020
Thrust functor return array; code optimization CUDA Programming and Performance	3	1104	December 27, 2019
Vector push_back in Thrust device code and strange behavior of nvc++/stdpar nvc, nvc++ and nvfortran cuda	0	1058	January 18, 2022
Help requested for a three vector thrust::transform CUDA Programming and Performance	10	831	August 8, 2022
Visual Studio 2012 + OptiX + Thrust = compile errors CUDA Programming and Performance	8	9754	February 17, 2016
Thrust for each is not working CUDA Programming and Performance	3	983	December 18, 2018

Can we call a thrust transform of a functor inside another functor?

Related topics