problem with pthread for multiGPU only return zeros from kernels

Hi,
I am following the simpleMultiGPU sample project to process some data with 2 GPUs
8800GT and 9500GT.
But for some reason when coding the pthread like in the simpleMultiGPU, all results are zeros, when I am not using the pthread, results are normal.
Does anyone knows what the problem is?
Help is greatly appreciated, thanks.