Thrust reduction question

I’m trying to figure out how to use the thrust library to do a certain kind of reduction, but I’m not seeing a straightforward way to do it.

I want to take an array that contains a series of binary strings and compare the binary strings to a template, then count how many of the string elements match and divide by the length of the string. For example:

A = [ 0 0 0 1;
0 0 1 0;
0 0 1 1 ]

B = [ 0 1 1 1]

For the 1st row in A → 0=0, 1!=0, 1!=0, 1=1 → 2 matches → 2/4 = 0.5
2nd row in A → 0=0, 1!=0, 1=1, 1!=0 → 2 matches → 2/4 = 0.5
3rd row in A → 0=0, 1!=0, 1=1, 1=1 → 3 matches → 3/4 = 0.75

I want the fraction or percentage of the binary string that matches as the result of the reduction and not just a boolean indicating that the binary string does or does not match.

I found a thrust routine to do a summation reduction, but I don’t want to just sum the columns of A. And I found a thrust routine to do a reduction by equivalence comparison, but it just returns ‘true’ or ‘false’ for whether or not the entire sequence matches.

Should I approach this some other way? I was using the approach of having a separate thread compare each row of A to the template and store that result in a temporary location, but I am hoping that using thrust will let me move away from programming the kernel routines myself.

Thanks for guidance.

You can do this with a call to transform_reduce. The transform portion will take your array of binary strings, and, using a custom functor, compute your output match result for each string, putting the results in a float vector. The reduce portion would just add up the float vector. Since your representation of a vector of strings is not obvious to me, to go any farther I would need to understand the details of your data storage/organization.

Here’s a slightly different approach.

  1. Flatten your A matrix into a thrust vector.
  2. Replicate your B matrix into a vector of the same length:

A = [0 0 0 1 0 0 1 0 0 0 1 1]
B = [0 1 1 1 0 1 1 1 0 1 1 1]

  1. Create a C vector that is 1 if there is a match between the two:

C = [1 0 0 1 1 0 1 0 1 0 1 1]

  1. Do an ordinary sum reduction on C:

res = 7

  1. Divide by N, the length of the template

ans = 7/4 = 1.75

Note that steps 2 and 3, the creation of B and C vectors, don’t have to be done explicitly, they can be handled through fusion. Also, since transform_reduce can only accept a single input vector, we must zip the A and (implicit) B vectors together. Here’s a fully worked example:

#include <iostream>
#include <stdlib.h>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/functional.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/transform.h>

#define A_ROWS 3
#define A_COLS 4
#define RANGE 2

struct my_modulo_functor : public thrust::unary_function<int, int>
{
  __host__ __device__
  int operator() (int idx) {
    return idx%(A_COLS);}
};

struct my_tuple_functor : public thrust::unary_function<thrust::tuple<int, int>, int>
{
  __host__ __device__
  int operator() (thrust::tuple<int, int> a) {
    if (a.get<0>() == a.get<1>()) return 1;
    return 0;}
};

typedef thrust::transform_iterator<my_modulo_functor, thrust::counting_iterator<int>, int> my_titer;
typedef thrust::permutation_iterator<thrust::device_vector<int>::iterator, my_titer> my_piter;
typedef thrust::zip_iterator<thrust::tuple<thrust::device_vector<int>::iterator, my_piter> > zip2int;

int main(){
  int A_mat[A_ROWS*A_COLS] = { 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1 };
  int B_mat[A_COLS] = {0, 1, 1, 1};

  thrust::host_vector<int> A(A_ROWS*A_COLS); // the data
  thrust::host_vector<int> B(A_COLS);        // the template
  // synthetic data:
//  for (int i = 0; i < A_ROWS*A_COLS; i++) A[i] = rand()%RANGE;
//  for (int i = 0; i < A_COLS; i++) B[i] = rand()%RANGE;
  // instead flatten/copy matrices into vectors A, B
  for (int i = 0; i < A_ROWS*A_COLS; i++) A[i] = A_mat[i];
  for (int i = 0; i < A_COLS; i++) B[i] = B_mat[i];

  thrust::device_vector<int> d_A = A;
  thrust::device_vector<int> d_B = B;
  zip2int first = thrust::make_zip_iterator(thrust::make_tuple(d_A.begin(), thrust::make_permutation_iterator(d_B.begin(), thrust::make_transform_iterator(thrust::counting_iterator<int>(0), my_modulo_functor()))));
  zip2int last = first + (A_ROWS*A_COLS);
  int red_sum = thrust::transform_reduce(first, last, my_tuple_functor(), 0, thrust::plus<int>());
  float res = (float)red_sum/(float)(A_COLS);
  std::cout << "result = " << res <<   std::endl;
  return 0;
}