Hi,
I have a code base written for a GTX 8800. I now have access to a Tesla system with C870 devices
/bin/linux/release/deviceQuery
There are 2 devices supporting CUDA
Device 0: "Tesla C870"
Major revision number: 1
Minor revision number: 0
Total amount of global memory: 1610350592 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1350000 kilohertz
Device 1: "Tesla C870"
Major revision number: 1
Minor revision number: 0
Total amount of global memory: 1610350592 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1350000 kilohertz
Test PASSED
Is there a way to reuse my existing code to issue 2X number of threads in parallel ?
thanks
kpg