I have three device_vectors - “similarity”, “group_ids”, “object_ids” with a dimension of 5_000_000.
Vectors are already sorted by “group_ids”, and within “group_ids” by “similarity” (double sorting).
“similarity” - assessment of the similarity of the object;
“object_ids” - unique object index;
“group_ids” - index of the group to which the object belongs.
Also, from the input I have the number “limit” and “group_ids_to_search”.
“group_ids_to_search” - a vector with indices “id” of groups for which it is necessary to generate results.
The response must contain no more than “limit” values for each “id” of “group_ids_to_search” and “group_ids”.
If “id” is not present in “group_ids”, fill it with 0 responses.
For example, imagine “similarity”, “group_ids”, “object_ids” vectors of dimension 9:
INPUT:
similarity |0.9, 0.85, 0.8, 0.77,| 0.98, 0.8,| 0.86, 0.7, 0.44|
group_ids |1, 1, 1, 1, | 2, 2, | 3, 3, 3 |
object_ids |1, 4, 5, 2, | 3, 7, | 6, 9, 8 |
limit = 2;
group_ids_to_search = {1,4,2,3};
OUTPUT:
similarity_out |0.9, 0.85,| 0, 0, | 0.98, 0.8,| 0.86, 0.7|
group_ids_out |1, 1, | 4, 4, | 2, 2, | 3, 3, |
object_ids_out |1, 4, | 0, 0, | 3, 7, | 6, 9, |
Since id=4 from (group_ids_to_search) is not in group_ids, its “similarity_out” and “object_ids_out” are filled with the value 0.
I implement this functionality using the “Thrust” library.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/functional.h>
#include <thrust/binary_search.h>
#include <thrust/execution_policy.h>
#include <thrust/copy.h>
#include <iostream>
template <typename T>
void printDeviceVector(const std::string& message, const thrust::device_vector<T>& list_ids) {
std::cout << message<< ": ";
for (const T& id : list_ids) {
std::cout << id << " ";
}
std::cout << std::endl;
}
int main()
{
// Create INPUT
thrust::device_vector<float> similarity = {0.9, 0.85, 0.8, 0.77, 0.98, 0.8, 0.86, 0.7, 0.44};
thrust::device_vector<int> group_ids = {1, 1, 1, 1, 2, 2, 3, 3, 3};
thrust::device_vector<int> object_ids = {1, 4, 5, 2, 3, 7, 6, 9, 8};
thrust::device_vector<int> group_ids_to_search = {1,4,2,3};
int limit = 2;
size_t size_search = group_ids_to_search.size();
// Create OUTPUT
thrust::device_vector<float> similarity_out(size_search*limit);
thrust::device_vector<int> object_ids_out(size_search*limit);
thrust::device_vector<int> group_ids_out(size_search*limit);
// Create thrust::device_vector to store pointers to the lower and upper bounds of the array group_ids
thrust::device_vector<unsigned int> lower_bounds(size_search);
thrust::device_vector<unsigned int> upper_bounds(size_search);
auto lower = thrust::lower_bound(
thrust::device,
group_ids.begin(),
group_ids.end(),
group_ids_to_search.begin(),
group_ids_to_search.end(),
lower_bounds.begin()
);
auto upper = thrust::upper_bound(
thrust::device,
group_ids.begin(),
group_ids.end(),
group_ids_to_search.begin(),
group_ids_to_search.end(),
upper_bounds.begin());
for (int i=0, sim_j = 0; i<size_search; i++, sim_j = sim_j + limit){
// Slice by "similarity" and "object_ids"
thrust::device_vector<float> similarity_slice = {similarity.begin() + lower_bounds[i], similarity.begin() + upper_bounds[i]};
thrust::device_vector<int> object_ids_slice = {object_ids.begin() + lower_bounds[i], object_ids.begin() + upper_bounds[i]};
thrust::device_vector<int> group_ids_slice(limit);
thrust::fill( group_ids_slice.begin(), group_ids_slice.end(), group_ids_to_search[i]);
// Resize. If the size of the vector was 0, then it will fill its values with zeros. (For the case when "id" = 4 from group_ids_to_search)
similarity_slice.resize(limit);
object_ids_slice.resize(limit);
// Fill the output arrays
thrust::copy(similarity_slice.begin(), similarity_slice.end(), similarity_out.begin()+sim_j);
thrust::copy(object_ids_slice.begin(), object_ids_slice.end(), object_ids_out.begin()+sim_j);
thrust::copy(group_ids_slice.begin(), group_ids_slice.end(), group_ids_out.begin()+sim_j);
}
printDeviceVector("similarity_out", similarity_out);
printDeviceVector("group_ids_out", group_ids_out);
printDeviceVector("object_ids_out", object_ids_out);
}
However, on vectors “similarity”, “group_ids”, “object_ids” of size 5_000_000 and “group_ids_to_search” of size 1000, this gives a slow result of ~80_000 microseconds.
If the last operations in the for loop are transferred to “host_vector”, then the result is ~30_000 microseconds, which is also slow.
I don’t have enough understanding of how to get rid of the slow for loop to generate the final vectors.
I think this can be done using thrust::for_each and thrust::transform and functors, but I don’t understand how exactly. Please help.