Device Selection

Howdy, I have a 9800GX2 and a c1060. See device Query below:

[codebox]Device 0: “GeForce 9800 GX2”

Major revision number: 1

Minor revision number: 1

Total amount of global memory: 536608768 bytes

Number of multiprocessors: 16

Number of cores: 128

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.51 GHz

Concurrent copy and execution: Yes

Device 1: “GeForce 9800 GX2”

Major revision number: 1

Minor revision number: 1

Total amount of global memory: 536543232 bytes

Number of multiprocessors: 16

Number of cores: 128

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.51 GHz

Concurrent copy and execution: Yes

Device 2: “Tesla C1060”

Major revision number: 1

Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.30 GHz

Concurrent copy and execution: Yes

Test PASSED[/codebox]

I am using atomicExch() for shared memory which requires > rev 1.2 hardware. So, I compile for 1.3 for the c1060. If I run the code using device 2 I get a whole bunch of random garbage as output, but if I use device 0 the results are reasonable. Is the system seeing that the c1060 is the only 1.3 card and assigning it device 0 or am I missing something? None of this makes any sense. Thanks