combining thrust::{zip_iterator,transform_iterator,counting_iterator} for modified summed area table

svennevs · December 14, 2017, 4:00am

I’ve been trying to wrap my head around how to achieve this, but am struggling to understand how to combine these iterators. The full example for the summed area table is here: thrust/summed_area_table.cu at master · NVIDIA/thrust · GitHub

I tried my best to only include the relevant details, apologies for the code-dump! I’m just not sure how to combine all three of these iterators. So, to start, the original iterator we’re working with:

// small modification to my code-base, I have all of these at compile time
template <class Projection>
struct row_index : public thrust::unary_function<unsigned, unsigned> {
    static constexpr unsigned cols = Projection::Depth::Width;
  
    __host__ __device__
    unsigned operator()(unsigned i) {
        return i / cols;
    }
};

// ... in some method below ...

static constexpr unsigned cols = Projection::Depth::Width;
static constexpr unsigned rows = Projection::Depth::Height;

thrust::counting_iterator<unsigned> indices(0);

thrust::inclusive_scan_by_key(
    thrust::make_transform_iterator(indices, row_index<Projection>()),
    thrust::make_transform_iterator(indices, row_index<Projection>()) + (rows * cols),
    // ...

But what I actually need to do here is zip together an extra input to compute a modified summed area table (specifically, the “second order derivative”). Sparing the gory details, my input comes in as float4, call this positions, where we actually only care about the XYZ in this case. The W coordinate is used elsewhere. So the second order would be a float6. I wrote a simple extrapolation kernel to “upcast” the input float4 to a float6 array to actually do the scanning / transposing with, which works fine. But I need the original points to compute this, so I thought a zip iterator of some kind would be the way to do this. I’m trying to achieve the following binary operator

using f4x6 = thrust::tuple<float4, float6>;

struct derive : public thrust::binary_function<f4x6, f4x6, float6> {
    __host__ __device__
    float6 operator()(const f4x6 &x, const f4x6 &y) {
        float6 ret = thrust::get<1>(x) + thrust::get<1>(y);
        float4 y4  = thrust::get<0>(y);
        ret.a += y4.x * y4.x;// this may need to be
        ret.b += y4.x * y4.y;// thrust::get<0>(x), but
        ret.c += y4.x * y4.z;// I need to make the iterator
        ret.d += y4.y * y4.y;// first...
        ret.e += y4.y * y4.z;
        ret.f += y4.z * y4.z;
        return ret;
    }
};

I’ve been staring at the awesome answers here: thrust::exclusive_scan with thrust::zip_iterator? - CUDA Programming and Performance - NVIDIA Developer Forums

but cannot seem to figure out how to approach building out the iterators. I can’t figure out how to create a transform iterator from both a zip iterator and a counting iterator. Something like

thrust::make_transform_iterator(
    thrust::make_zip_iterator(
        thrust::make_tuple(float4_src, float6_src) // device pointers
    ) [ ? ] "AND indices"

I clearly don’t get how (or if) this can be done, I can’t get anything that I think would make sense to compile.

Any thoughts / suggestions? I just need to access the original input while building out the sum. Thanks for any advice!

Topic		Replies	Views
thrust::exclusive_scan with thrust::zip_iterator? CUDA Programming and Performance	9	1542	November 24, 2014
Dispatch iterator (THRUST) CUDA Programming and Performance	7	625	October 17, 2019
How to merge these operations. CUDA Programming and Performance	1	1606	December 7, 2011
Cuda Thrust Custom function CUDA Programming and Performance	6	5596	September 25, 2011
Thrust `zip_iterator` with arbitrary number of iterators CUDA Programming and Performance	8	225	September 3, 2024
convert Matlab array multiplication and sum function to CUDA equivalent CUDA Programming and Performance	12	13127	August 17, 2010
Thrust reduction question CUDA Programming and Performance	2	1021	February 27, 2014
custom Thrust iterator to combine consecutive elements of array CUDA Programming and Performance	1	1380	December 14, 2018
Help requested for a three vector thrust::transform CUDA Programming and Performance	10	831	August 8, 2022
Nested Zip_Iterator for output in THRUST GPU-Accelerated Libraries	2	1416	April 3, 2015

combining thrust::{zip_iterator,transform_iterator,counting_iterator} for modified summed area table

Related topics