Invalid device ordinal error on multiGPU system cudaSafeCall() Runtime API error : invalid device or

In the attached file, I get an error during runtime on line 10 of lock.cuh. This is essentially from the CUDA by Example code, in which a lock is implemented. On systems with one device, the code works fine. However on a multi-GPU system, I get the following error:

./lock.cuh(10) : cudaSafeCall() Runtime API error : invalid device ordinal.

In the calling code, I make sure to set the CUDA device before creating any lock objects. Any thoughts on remedying this problem?

I even tried setting the CUDA device in each instantiation of the object, but that still throws an error.