Unable to launce CUFFT on Multiple GPU

Hi,
My card is GTX 295. I am trying to calculate separate FFTs on both of the GPU cores simultaneously. But by profiling using Nsight, it is shown that there are two context running on the #0 GPU core but NO context on the other. Can anybody show me how to use CUFFT for multiple gpus?(I had referred to CUDA SDK’s simple example on multi-gpu, and had called cudaSetDevice in each thread. Launching simple kernels on both of the gpus is successful. But once I called cufftPlan, all the kernels are launched on the #0 gpu instead. )

Also, is setting Compute Mode to EXCLUSIVE helpful to my problem? It seems Compute Mode is only optional for Tesla. For guys have the experience of using multi-gpu CUFFT on Tesla, could you tell me if there’s any special configuration for doing this?

Supplement: SLI is closed in Nvidia Control Panel.