nvidia-smi on Amazon EC2 Cannot stop ECC for second GPU

Hi,

I recently began using Amazon EC2 as a testbed for multi-GPU computations.
Since I currently do not neet the ECC feature, I stop it to gain more performance. However, I was very surprized (and annoyed) by the fact that I am only able to do this for the first GPU. Both GPUs are identical, M2050, as reported by deviceQuery:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: “Tesla M2050”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2817982464 bytes
Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No

Device 1: “Tesla M2050”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2817982464 bytes
Multiprocessors x Cores/MP = Cores: 14 (MP) x 32 (Cores/MP) = 448 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: Yes
Device is using TCC driver mode: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 2, Device = Tesla M2050, Device = Tesla M2050

PASSED

Press to Quit…

For nvidia-smi, I get the following:

nvidia-smi -r
ECC configuration for GPU 0:
Current: 1
After reboot: 1
ECC is not supported by GPU 1

and, needless to say, when I try to
nvidia-smi -g 0 --ecc-config=1
everything works, but when I try to
nvidia-smi -g 1 --ecc-config=0
i get
ECC is not supported by GPU 1 or the ECC configuration cannot be changed

Has anybody seen this problem before? Is there a solution?

Cheers,
Serban

can you file a bug with Amazon?

Just reported the issue, hope it doesn’t take to long …

I don’t see the driver version you are using listed in your post (maybe I missed it.) If you’re not using the latest driver version could you retry your case with the latest Tesla M2050 driver posted to nvidia.com (right now it appears to me to be 260.1936) Linux x64 (AMD64/EM64T) Display Driver | 260.19.36 | Linux 64-bit | NVIDIA

Hey, thanks a lot! That did the trick !

The thing is, I had the latest developer driver from here http://developer.nvidia.com/object/cuda_3_0_downloads.html#Linux, that is http://developer.download.nvidia.com/compute/cuda/3_0/drivers/devdriver_3.0_linux_64_195.36.15.run, which is 260.19.26. With the updated driver all works correctly:

nvidia-smi -r

ECC configuration for GPU 0:

Current: 0

After reboot: 0

ECC configuration for GPU 1:

Current: 0

After reboot: 0

That’s somewhat weird, that the latest developer driver is outdated … anyway, now both GPUs are performing well. So problem solved.

Cheers,

Serban