CUDA on two devices

I would highly appreciate if someone posts sample code or recomendation on work with two or more devices.

Sample from SDK seems to fail on my 2GF 8800 Ultra system, while works correctly on 1GF 8800 Ultra.

One thing I am interested in is how to work correctly with static and global vars declared in .cu files in case of two devices (and two CPU threads as consequence).

BTW am I able to run emulation of two devices in emulation mode?

Thank you in advance.