Hi,

This is my problem:

I got a very long array A of float values (eg. 1Mio, or 10Mio). All float values are randomly in the range of [0…N]. N can be about 1000 - 4000.

I have a second array B , this one is much smaller, with just N floats. All values of the first array have to be “put into” the second array, with linear weighting.

That means for example:

One element of the array A is “100.2”. So element “100” of the second array B is incremented by 0.8 and element “101” is incremented by 0.2.

Another element of the array A is “132.5”. So element “132” of the second array B is incremented by 0.5 and element “133” is incremented by 0.5.

…

Until all 10Mio Elements of array A are done.

Does anyone have a good idea, how to manage this (fast) with cuda?

Thanks for any help!