I am working on numba cuda, i am trying to calculate the cumulation of a matrix and whenever the value exceed 127 i got overflow / wrong values the whole algorithm done within shared memory. My laptop has nividia geforce gtx 860
However i tried to predefine shared memory as uibt8, 16,32 and float32