cuCtxCreate hangs for ever Tesla S1070

Hi,
I’m experiencing some problema on my Tesla S1070. I’m not able to use first 2 devices anymore. For example a call to
cuCtxCreate hangs for ever, if used on first 2 devices, however it works on last 2 ones. Anyone can help me to identify
the problem before I do reboot my host ?

Thanks
Gaetano Mendola

Anything about it in /var/log/messages? (maybe the cable came out)

The application on SDK "deviceQuery reports all devices, so I don’t think

it can be a cable problem:

There are 5 devices supporting CUDA

Device 0: “Tesla T10 Processor”

Major revision number: 1

Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Device 1: “Quadro NVS 290”

Major revision number: 1

Minor revision number: 1

Total amount of global memory: 267714560 bytes

Number of multiprocessors: 2

Number of cores: 16

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 0.92 GHz

Concurrent copy and execution: Yes

Device 2: “Tesla T10 Processor”

Major revision number: 1

Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Device 3: “Tesla T10 Processor”

Major revision number: 1

Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Device 4: “Tesla T10 Processor”

Major revision number: 1

Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.44 GHz

Concurrent copy and execution: Yes

Not necessarily true, because I think devices are only enumerated at boot.

Disconnecting the cable the “queryDevice” hangs too. Anyway just rebooted the

host and now all works again, if it happens again I will let you know.

BTW I have filled the form for CUDA2.2 but I didn’t get any response yet, how much

time is usualy required for the approval ?

Regards

Gaetano