Can we call a thrust transform of a functor inside another functor?

I don’t think this could work (as I don’t think it would for with thrust::for_each either) - but just to strengthen the point. Is something along the line of:

struct Op1 : public thrust::unary_function<int, float>
{
  __host__ __device__
  float operator()(int iD)
  {
     // Some calculation
     return answer;
  }
}

struct Op2 : public thrust::unary_function<float, float>
{
  int *vec1Ptr_;
  int *vec2Ptr_;
  public:
    __host__ __device__
    Op2(int* vec1Ptr, int* vec2Ptr) : vec1Ptr_(vec1Ptr), vec2Ptr_(vec2Ptr) {};

    __host__ __device__
    float operator()(float input)
    {    
      thrust::transform(vec1Ptr.begin(), vec1Ptr.end(), vec2Ptr.begin(), Op1);
      // some calculation on vec2
      return answer2;
    }
}

allowed?

Generally, yes, you can embed a thrust algorithm in a functor. That thrust algorithm call could conceivably make use of another user-defined functor.

An example of a thrust functor making a thrust algorithm call is here:

https://stackoverflow.com/questions/28150098/how-to-use-thrust-to-sort-the-rows-of-a-matrix/28254765#28254765

in particular, study the construction and usage of the sort_functor there. (It doesn’t use a custom operator, though. Instead it uses a built-in operator. But that could be changed.)

what you have shown will not work possibly for several reasons. One of them is that this:

vec1Ptr.begin()

is invalid usage.

vec1Ptr is a ordinary pointer to int

In C++, when you do something like:

thing.begin()

you are attempting to invoke a class method, for that particular object. In ordinary thrust usage,

thing.begin()

would imply that thing is a member of some class that has a class method named begin

This is not a true statement when thing is a bare pointer.

It would be a true statement if thing were e.g. a thrust vector, eg.

thrust::device_vector thing;

The problem would then arise that objects of type thrust::device_vector can’t be directly used in CUDA device code, as already pointed out to you here:

https://stackoverflow.com/questions/51292290/storing-a-device-vector-inside-a-functor-through-the-constructor

So you would need to craft your usage of thrust algorithms embedded in functors to make sure to invoke the device path for the algorithm, and to only use bare device pointers as iterators. I refer you to the first example link.

To intercept the next question, yes, unlike thrust::device_vector, thrust::device_ptr is usable directly in CUDA device code. usage-wise, it is not much different than a bare pointer. It does not have begin() and end() methods.

another reason what you have shown will not work is that Op2 really has no knowledge of what Op1 is. But that is solvable with some recrafting of Op2. (see my next comment in this thread for an example)

Thank you very much - once again! I deliberately used pointer in the rough example as I have learned a thing or two from that question by now (yay, progress!)

I am scared of how expected questions can be for a new learner…

Here’s a worked example, roughly following your outline:

$ cat t1390.cu
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <thrust/device_vector.h>

struct Op1 : public thrust::unary_function<int, int>
{
  __host__ __device__
  float operator()(int iD)
  {
     return iD+1;
  }
};

template <typename T>
struct Op2 : public thrust::unary_function<float, float>
{
  int *vec1Ptr_;
  int *vec2Ptr_;
  T my_op_;
  int n_;
  public:
    __host__ __device__
    Op2(int* vec1Ptr, int* vec2Ptr, int n, T op) : vec1Ptr_(vec1Ptr), vec2Ptr_(vec2Ptr), n_(n), my_op_(op) {};

    __host__ __device__
    float operator()(float input)
    {
      thrust::transform(thrust::seq, vec1Ptr_, vec1Ptr_+n_, vec2Ptr_, my_op_);
      return (float)(vec2Ptr_[0])+input;
    }
};

int main(){
  const int vlen = 3;
  thrust::device_vector<float>  in(1,0.5f);
  thrust::device_vector<float> out(1);
  thrust::device_vector<int> v1(vlen);
  thrust::device_vector<int> v2(vlen);
  thrust::transform(in.begin(), in.end(), out.begin(), Op2<Op1>(thrust::raw_pointer_cast(v1.data()), thrust::raw_pointer_cast(v2.data()), vlen, Op1()));
  thrust::copy(out.begin(), out.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl;
}
$ nvcc -o t1390 t1390.cu
$ ./t1390
1.5,
$

(I had to make various changes.)

The in vector starts out at 0.5. the host code transform operation calls Op2 on that, which in turn calls Op1 through another transform call. That transform takes v1 and adds 1 to each element. They start out at 0 so now each element is 1. The host code transform then takes this 1 value and adds 0.5 to it.

This is kind of silly code. One reason is that the thrust transform call contained in Op2 is called once for each element of the vector passed to Op2. Since each call is operating on the same v1 and v2 vectors, I can’t think of a sensible use case for this construct (it could probably make sense if we were working on the rows or columns of a matrix, in which case I refer you to the example linked in my previous comment). Therefore I’ve crafted my governing vectors to only have 1 element, so that the resultant call to transform in the Op2 functor only happens once, for sanity.

Truly appreciate your detailed response and your effort!

One of the main reason I can see why I am committing to these ‘suicidal’ routes is because the original algorithm works pretty nicely with thrust using OpenMP/Intel TBB as back end. I am trying to extend it to thrust/CUDA to double check the potential speed up since our problem is embarrassingly parallel, without refractoring the code base a lot. Having poked different corners of thrust/CUDA for a while, I don’t think we can get away with such without massive anatomy on the back end (as you and others have kindly pointed out the obvious requirements).

@txbob is there any reasons why the following codes won’t compile? The first case:

#include "thrust/device_vector.h"
#include "thrust/device_ptr.h"
#include "thrust/transform.h"

struct Functor2  : public thrust::unary_function<int, int> {
  __host__ __device__
    int operator()(int i) { return 2*i;}
};


class Functor1  : public thrust::unary_function<int, int> {
 public:
  __host__ __device__
  int operator()(int j) {

    // Do some dummy stuff for an excuse to call another functor

    int p[1000], q[1000];
    thrust::device_ptr<int> p_ptr(p), q_ptr(q);
    thrust::transform(p_ptr, p_ptr+1000, q_ptr, Functor2());

    return j+1;
  }
};


int main(int argc, char *argv[]) {
  Functor1 myfunctor1;
  
  thrust::device_vector<int> invec(100);
  thrust::device_vector<int> outvec(100);
 
  thrust::transform(invec.begin(), invec.end(), outvec.begin(),
            myfunctor1);
  return 1;
}

The second case:

#include "thrust/device_vector.h"
#include "thrust/device_ptr.h"
#include "thrust/transform.h"
#include "thrust/for_each.h"


struct Functor2  : public thrust::unary_function<int, int> {
  __host__ __device__
    int operator()(int i) { return 2*i;}
};


class Functor1  : public thrust::unary_function<int, int> {
 public:
  __host__ __device__
  int operator()(int i) {

    // Do some dummy stuff for an excuse to call another functor

    int p[1000], q[1000];
    thrust::device_ptr<int> p_ptr(p), q_ptr(q);

    thrust::transform(p_ptr, p_ptr+1000, q_ptr, Functor2());

    return i+1;
  }
};


int main(int argc, char *argv[]) {

  int ncells = 100;

  Functor1 myfunctor1;
  
  thrust::device_vector<int> invec(ncells);
  for (int i = 0; i < ncells; i++) invec[i] = i;
 
  thrust::device_vector<int> outvec(ncells);
 
  thrust::transform(invec.begin(), invec.end(), outvec.begin(),
            myfunctor1);

  return 1;
}

In either case, nvcc complains about calling host inside host device is not allowed. I don’t think I have anything calling host functions inside my device code, no?

Use an execution policy and specify thrust::device or thrust::seq on your transform calls in the functor. Study the code I already gave you carefully for proper usage.

I see now - sorry I must have missed it!