I’ve been trying to wrap my head around how to achieve this, but am struggling to understand how to combine these iterators. The full example for the summed area table is here: https://github.com/thrust/thrust/blob/master/examples/summed_area_table.cu

I tried my best to only include the relevant details, apologies for the code-dump! I’m just not sure how to combine all three of these iterators. So, to start, the original iterator we’re working with:

```
// small modification to my code-base, I have all of these at compile time
template <class Projection>
struct row_index : public thrust::unary_function<unsigned, unsigned> {
static constexpr unsigned cols = Projection::Depth::Width;
__host__ __device__
unsigned operator()(unsigned i) {
return i / cols;
}
};
// ... in some method below ...
static constexpr unsigned cols = Projection::Depth::Width;
static constexpr unsigned rows = Projection::Depth::Height;
thrust::counting_iterator<unsigned> indices(0);
thrust::inclusive_scan_by_key(
thrust::make_transform_iterator(indices, row_index<Projection>()),
thrust::make_transform_iterator(indices, row_index<Projection>()) + (rows * cols),
// ...
```

But what I actually need to do here is zip together an extra input to compute a modified summed area table (specifically, the “second order derivative”). Sparing the gory details, my input comes in as **float4**, call this **positions**, where we actually only care about the XYZ in this case. The W coordinate is used elsewhere. So the second order would be a **float6**. I wrote a simple extrapolation kernel to “upcast” the input **float4** to a **float6** array to actually do the scanning / transposing with, which works fine. But I need the original points to compute this, so I thought a zip iterator of some kind would be the way to do this. I’m trying to achieve the following binary operator

```
using f4x6 = thrust::tuple<float4, float6>;
struct derive : public thrust::binary_function<f4x6, f4x6, float6> {
__host__ __device__
float6 operator()(const f4x6 &x, const f4x6 &y) {
float6 ret = thrust::get<1>(x) + thrust::get<1>(y);
float4 y4 = thrust::get<0>(y);
ret.a += y4.x * y4.x;// this may need to be
ret.b += y4.x * y4.y;// thrust::get<0>(x), but
ret.c += y4.x * y4.z;// I need to make the iterator
ret.d += y4.y * y4.y;// first...
ret.e += y4.y * y4.z;
ret.f += y4.z * y4.z;
return ret;
}
};
```

I’ve been staring at the awesome answers here: https://devtalk.nvidia.com/default/topic/786766/thrust-exclusive_scan-with-thrust-zip_iterator-/

but cannot seem to figure out how to approach building out the iterators. I can’t figure out how to create a transform iterator from both a zip iterator **and** a counting iterator. Something like

```
thrust::make_transform_iterator(
thrust::make_zip_iterator(
thrust::make_tuple(float4_src, float6_src) // device pointers
) [ ? ] "AND indices"
```

I clearly don’t get how (or if) this can be done, I can’t get anything that I think would make sense to compile.

Any thoughts / suggestions? I just need to access the original input while building out the sum. Thanks for any advice!