I need to count the number of elements in each group
I have about 32 000 groups, each object belongs to the 1000 group
my code to do the sum
kernel void VectorAdd3(
global read_only int* index,
global read_only float* values,
global read_only short* data,
global float2* mx)
{
int factor = get_global_id(0);
int fin = factor * 32;
for (int i = 0, fi = factor * 100000; i < 100000; i++, fi++) {
int mindex = index[i] + fin + data[fi];
mx[mindex] += (float2)(values[i], 1);
}
}
factor = 1024
code running on GPU is much slower than on CPU
how can I improve the performance of my implementation, or do I need to implement a different algorithm for this task?