running on different graphic card

I met a problem that I ran the same code on two different computer with different graphic card, but got different results.
The graphic cards are 8800GTS (supporting CUDA 1.0) and 8600GTS (supporting CUDA 1.1).
When I only used global memory, running the code on these two graphic card got the same results.
However, when I used share memory instead of global memory, the results would be different.
The result from 8600GTS is correct but from 8800GTS is wrong.

My question is that why I ran the same code on these two graphic card but getting different results?
Might it because of the different version of 1.0 and 1.1? Or any other reasons?
Is anyone able to explain it?