Forcing serail computation

Does anybody know of a quick way, for debugging purposes, to force the GPU into performing all computations serially?
i.e., to invoke only one thread at a time?

Thanks,
Jon.

Try to run on device emulation mode with cuda 3.0 or early. It has warp size 1. It may help. Also compare debug and release mode too.

Try to run on device emulation mode with cuda 3.0 or early. It has warp size 1. It may help. Also compare debug and release mode too.

Thanks… Didn’t think of that :)

Thanks… Didn’t think of that :)