Peer-to-Peer multiple C2070 Problem

I am currently working with GPU-direct of CUDA 4.0, and I have the problem, that my programs always crash. This happens when I use 3 instead of 2 GPUs. I do not know the reason yet, but I do have an assumption:

I am using different kernels, that all access data located in GPU0. They also write back in the memory of GPU0. Is it possible, that the program crashes if GPU1 writes data in an array located in GPU0 at the same time GPU2 is reading the data?

Also, I get an error at

checkErrors(“P2P start”);

// Enable peer access
for(int i=0; i<gpu_n; i++)
    for(int j=i+1; j<gpu_n; j++)
        printf("Enabling peer access between GPU%d and GPU%d...\n", gpuid_tesla[i], gpuid_tesla[j]);
         (cudaDeviceEnablePeerAccess(gpuid_tesla[j], gpuid_tesla[i]));
         (cudaDeviceEnablePeerAccess(gpuid_tesla[i], gpuid_tesla[i]));

checkErrors(“P2P successful”);

with the output:

Enabling peer access between GPU1 and GPU2…
CUDA Error: invalid argument (at P2P)

I assume the P2P between GPU1 and GPU2 can not be established?

Thank you for your help!

P2P works only for GPUs under the same PCI-e root.
You could see the PCI-e tree with “lspci -t”