Assign Part of array for each block threads

Dear all,

i am still newbie to CUDA.
i have a project where i have an array A with length N. and array B
i want to divide the array A into segments each segment is assigned to a specific block threads.
e.x.
A[12]={1,4,5,3,6,3,2,5,2,1,1,2};
B[3,6,11];

i want to launch three blocks first block calculate the sum of elements form 0-3, second block calculate the sum of elements form 4-6, and third block calculate the sum of elements form 7-11.

i am not asking for code. i am asking for algorithm

thanks all

for ( int i = 0; i < N/3; ++i ) {
  int t = i*3;
  calculate using A[t..t+2] and B[0..2]
}

Launch 3 blocks. In each block do a block-level reduction on the appropriate data set. Use the blockIdx.x built-in variable in your CUDA kernel to select the appropriate element of an array that defines the data boundaries.

You can write your own block-level reduction, but the CUDA reduction sample code is a good thing to review:

https://docs.nvidia.com/cuda/samples/6_Advanced/reduction/doc/reduction.pdf

and CUB provides block-level reduction as a library operation:

https://nvlabs.github.io/cub/index.html
https://nvlabs.github.io/cub/classcub_1_1_block_reduce.html