cudaDeviceEnablePeerAccess fails Enabling peer-to-peer device memory access fails

Hello,
When enabling peer memory access in my code, it fails with an error. Refer following code snippet (from my file “cuda_code.cu”):
++++++++++++++++++++++++++++++
int canAccessPeer = 0;
if (cudaSuccess == cudaDeviceCanAccessPeer(&canAccessPeer, 0, 1))
{
if (canAccessPeer == 1)
printf(“True\n”);
else
printf(“False\n”);
}
cudaDeviceEnablePeerAccess(1,0);
CudaCheckError();
+++++++++++++++++++++++++++++

The return value of “canAccessPeer” is 0.
The “CudaCheckError()” fails with following error:
cudaCheckError() failed at cuda_code.cu:326 : invalid device ordinal.

My system information:

Two Tesla M2070
64 bit, CentOS 5.6

NVRM version: NVIDIA UNIX x86_64 Kernel Module 270.41.19 Mon May 16 23:32:08 PDT 2011
GCC version: gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)
CUDA Driver Version / Runtime Version 4.0 / 4.0
CUDA Capability Major/Minor version number: 2.0
+++++++++++++++++++++++

Also the output of SDK sample “simpleP2P”:

Checking for multiple GPUs…
CUDA-capable device count: 2

GPU0 = " Tesla M2070" IS capable of Peer-to-Peer (P2P)
GPU1 = " Tesla M2070" IS capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access…

Peer access from Tesla M2070 (GPU0) → Tesla M2070 (GPU1) : No
Peer access from Tesla M2070 (GPU1) → Tesla M2070 (GPU0) : No
Two or more Tesla(s) with class GPUs are required for ./simpleP2P to run.
Support for UVA requires a Tesla with SM 2.0 capabilities.
Peer to Peer access is not available between GPU0 ↔ GPU1, waiving test.
PASSED
+++++++++++++++++++++++++

Thanks,
Nikhil

This error also occurs with 2 Tesla M2075 when using 4.1 release on 64 bit CentOS 6.0. Below is the code snippet from SDK example “simpleP2P.cu”
++++++++++++++++++++++++
// Enable peer access
printf(“Enabling peer access between GPU%d and GPU%d…\n”, gpuid[0], gpuid[1]);
checkCudaErrors(cudaSetDevice(gpuid[0]));
checkCudaErrors(cudaDeviceEnablePeerAccess(gpuid[1], 0));
checkCudaErrors(cudaSetDevice(gpuid[1]));
checkCudaErrors(cudaDeviceEnablePeerAccess(gpuid[0], 0));
++++++++++++++++++++++++
Here is the error when executing it:
simpleP2P.cu(273) : CUDA Runtime API error 10: invalid device ordinal

NOTE: I commented-out the “exit” statement in simpleP2P.cu so that the API “cudaDeviceEnablePeerAccess” is called.