 # Need help understanding thrust::scatter!

Okay, I just found out a really cool thing to do is sort any number of arrays by key by splitting up the operation like so:

``````thrust::sequence(indices.begin(), indices.end());
thrust::sort_by_key(keyvals.begin(), keyvals.end(), indices.begin());

foreach (vector_to_be_sorted)
thrust::scatter(data.begin(), data.end(), indices.begin(), sorted_data.begin());
``````

Reading the documentation it says, “For each iterator i in the range [first, last), the value i is assigned to output[(map + (i - first))].”

Okay, let’s say we have some data.

``````keys = { 0, 7, 6, 4, 2, 3, 5, 1 };
data = { a, b, c, d, e, f, g, h };
``````

Doing the initial sort, we get

``````keys    = { 0, 1, 2, 3, 4, 5, 6, 7 };
indices = { 0, 7, 4, 5, 3, 6, 2, 1 };
``````

In theory, data should look like this at the end :

``````data = { a, h, e, f, d, g, c, b };
``````

But every time I try to do this on a piece of paper, I swear I’m doing it wrong.

From the description, it sounds

``````data = a is mapped to indices = 0
data = b is mapped to indices = 7
data = c is mapped to indices = 4
``````

The one for data does not look true at all!

Instead, it seems to work if

``````output[i] = data[indices[i]]
``````

I think I’m just really confusing myself here…

Let’s start with unsorted data, and a linear key sequence. The first trick is to reverse the meaning of keys and values (data):

``````int keys = { 107, 103, 104, 100 };
int vals = {   0,   1,   2,   3 };
``````

If I then do a thrust sort_by_key, using the above “keys” as my key array and the above “vals” as my values array, I will end up with:

``````keys = { 100, 103, 104, 107 };
vals = {   3,   1,   2,   0 };
``````

I can now use the “vals” array as the output index into a “results” array to sort any other array according to the same sort order. For example if I had a data array like this:

``````char data = { 'h', 'd', 'e', 'a' };
``````

and an “empty” results array of char:

``````char rslt = {  .,  .,   .,  . };
``````

then I can do a thrust::scatter using the sorted vals as my “map” array:

``````data  = { 'h', 'd', 'e', 'a' };
vals = {   3,   1,   2,   0 };
rslt = { 'a', 'd', 'e', 'h' };
``````

‘h’ went to position 3 in the result array because that’s where the index in vals indicated it should go during the scatter.

Indeed,

``````rslt[vals[i]] = data[i]  for i in 0,1,2...N
``````

Gah, this is all so confusing! When you do it, it works but when I do it with a slightly larger example it seems to fail.

For example, consider this :

``````pa  : 7, 3, 4, 0, 2, 5, 1, 6
idx : 0, 1, 2, 3, 4, 5, 6, 7
ta  : a, b, c, d, e, f, g, h

sorted by pa as key...
pa  : 0, 1, 2, 3, 4, 5, 6, 7
idx : 3, 6, 4, 1, 2, 5, 7, 0
ta  : d, g, e, b, c, f, h, a
``````

But when I run this code, the output is wrong! I’m getting

``````h
d
e
a
c
f
b
g
``````

But with my own custom kernel, I get the output that seems correct :

``````d
g
e
b
c
f
h
a
``````

Here’s the code :

``````// nvcc -std=c++11 -O3 -o scatter scatter.cu

#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/scatter.h>
#include <thrust/sort.h>

const int tpb = 256,
bpg = 512;

__device__
{
return blockDim.x * blockIdx.x + threadIdx.x;
}

__device__
int grid_size(void)
{
return blockDim.x * gridDim.x;
}

__global__
void scatter
(
const int n,
const int * __restrict__ input,
const int * __restrict__ indices,
int * __restrict__ output
)
{
for (int tid = currThreadID(); tid < n; tid += grid_size())
{
const int idx = indices[tid];
const int tmp = input[idx];
output[tid] = tmp;
}
}

int main(void)
{
const int  pa = {  7,   3,   4,   0,   2,   5,   1,   6 };
const int idx = {  0,   1,   2,   3,   4,   5,   6,   7 };
const char ta = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' };

thrust::device_vector<int> d_pa(pa, pa + 8),
d_idx(idx, idx + 8),
d_ta(ta, ta + 8),
output(8, -1);

thrust::sort_by_key(d_pa.begin(),
d_pa.end(),
d_idx.begin());

/*thrust::scatter(d_ta.begin(),
d_ta.end(),
d_idx.begin(),
output.begin());*/

scatter<<<bpg, tpb>>>(8, thrust::raw_pointer_cast(d_ta.data()), thrust::raw_pointer_cast(d_idx.data()), thrust::raw_pointer_cast(output.data()));

for (int i = 0; i < 8; ++i)
{
std::cout << ((char ) output[i]) << std::endl;
}

return 0;
}
``````

Your kernel does not match the scatter operation provided by thrust.

``````output[i] = input[indices[i]];
``````

The thrust scatter operation is:

``````output[indices[i]] = input[i];
``````

They are not the same. (Take a look at the last line of my previous post.)

If you rewrite your kernel as follows:

``````__global__
void scatter
(
const int n,
const int * __restrict__ input,
const int * __restrict__ indices,
int * __restrict__ output
)
{
for (int tid = currThreadID(); tid < n; tid += grid_size())
{
const int idx = indices[tid];
const int tmp = input[tid];
output[idx] = tmp;
}
}
``````

I think you’ll get results that match betwen your kernel and thrust.

Note that this realization:

``````output[i] = input[indices[i]];
``````

is the thrust::gather operation:

https://thrust.github.io/doc/group__gathering.html#ga6fdb1fe3ff0d9ce01f41a72fa94c56df

Interesting. Then why is my thrust::scatter operation giving wrong results? Why does gathering seem to work then?

Either you are not understanding what I said, or I am not understanding what you said. I interpreted this excerpt from your comment:

to mean that those were the results you were getting from thrust::scatter. (they are certainly not the results that your kernel produced.) Those are the correct results. If you modify your kernel to match what I indicated, then your kernel will produce that same output.

Am I doing my sort_by_key correctly? I swear to God I’m doing it right on paper but every time I use thrust::scatter, I get weird results.

Have you been able to successfully make the sorting work using thrust::scatter? I know you posted that small example but why is it not working for my dataset?

The sorted characters should be :

``````d g e b c f h a
``````

our examples are different. The first line of my first response was “let’s start with unsorted data”.

My data order “matched” my key order:

``````int keys = { 107, 103, 104, 100 };
char data = { 'h', 'd', 'e', 'a' };
``````

``````pa  : 7, 3, 4, 0, 2, 5, 1, 6