Program Works in Emulation but not on run

I have a program that works perfectly in device emulation mode, but fails on the actual device. It was also working on regular mode before I added the change that a different matrix would be loaded depending on the block index to the same shared variable. I thought that shared data should be different on every block so that I can store to the same shared variable different values depending on the block. Does anyone have any ideas on what is going wrong?

Src plz

NM, I figured out that the problem was I was using up all of the cards memory and that was causing the data corruption. Thanks anyway.