how do I use > C870 device in parallel tesla multi GPU

Hi,

I have a code base written for a GTX 8800. I now have access to a Tesla system with C870 devices

/bin/linux/release/deviceQuery 
There are 2 devices supporting CUDA

Device 0: "Tesla C870"
  Major revision number:                         1
  Minor revision number:                         0
  Total amount of global memory:                 1610350592 bytes
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          262144 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1350000 kilohertz

Device 1: "Tesla C870"
  Major revision number:                         1
  Minor revision number:                         0
  Total amount of global memory:                 1610350592 bytes
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          262144 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1350000 kilohertz

Test PASSED

Is there a way to reuse my existing code to issue 2X number of threads in parallel ?

thanks

kpg

To use multiple GPUs at once, you’ll either need to create multiple threads on the host, and attach to one GPU in each host thread, or open each GPU in a separate process. At present there’s no feature for automatically splitting work across multiple GPUs from a single cuda kernel launch, so you have to do it for yourself. It’s pretty easy to do. I describe this in more detail in some of the UIUC ECE498 talks I gave, and the powerpoint files should be posted online still.

Cheers,
John Stone

Take a look at the multiGPU example in the SDK. It basically just uses pthread_create (on linux) to spawn as many threads as there are GPUs, and then executes a kernel on the Teslas.

Using two GPUs is more about programming with POSIX threads than it is about programming in CUDA.