Overflow in cuda cumulative results

I am working on numba cuda, i am trying to calculate the cumulation of a matrix and whenever the value exceed 127 i got overflow / wrong values the whole algorithm done within shared memory. My laptop has nividia geforce gtx 860

However i tried to predefine shared memory as uibt8, 16,32 and float32

Solved by changing both the cuda side data type and cpu
The cpu side was the problem