How to use 2 devices

I’m wondering how I can make use of the 2 GPUs on the GeForce 9800GX2 card. If I have only 1 kernel, does it mean that I can run it on only 1 GPU?

To use more than one GPU, you either need to run two copies of your program (each calling cudaSetDevice() with a different ID number), or you need your program to start two threads, and each thread has to call cudaSetDevice() with a different ID number. But you are correct: there is no way to use 2 GPUs if you only run one kernel at at time.

I see, thanks. Another problem with this 9800GX2 card is that the initial cudaMalloc() costs a long time, much longer than the former 8800GTX card I used to use. Is there any hints to explain it?