Hello all. i have some question on CUDA programming.

Suppose i have an array with 10k elements . i want to do the following expression as

a[3] = a[5] + a [4].

a[4] = a[6] + a [5].

a[5] = a[7] + a [6].

a[6] = a[8] + a [7].

.

.

.

a[9998] = a[10000] + a[9999]

how do i arrange the thread and block to compute the above expression in parallel way??

thx a lot since this concept is very important for me to do the work… :wacko: