blocks and threads question

If i have an array of numbers int i[1024] for example, and I need to do calculate each number against another in it.
i[1] * i[2],
i[1] * i[3]… i[2] * i[1], i[2] * i[3]… and so on

is there an effective way to arrange this in a 2D grid considering there is a limited number of thread per block.

I don’t know if this can be done at the same timestep. I think this is in conflict with Data Indepence. Maybe you can take a look at CUDPP, I dont know if you can use it but still.