Small BUG in deviceQuery.cpp

Sergiy · January 5, 2014, 12:44am

Please, look at cuda-5.5/samples/1_Utilities/deviceQuery/deviceQuery.cpp,
starting from line 221. (I use Linux).

We have quantity of GPUs in “gpu_p2p_count” and want to check all combinations for RDMA support.
Probably, codes can use simple scenario like following:
for (int i = 0; i < gpu_p2p_count; i++)
for (int j = i + 1; j < gpu_p2p_count; j++){
// error-check and result-print ommited
cudaDeviceCanAccessPeer(&can_access,i,j);
cudaDeviceCanAccessPeer(&can_access,j,i);
}

Let’s see for this fragment in CUDA 5.5 (simplified) :

for (int i = 0; i < gpu_p2p_count-1; i++)
for (int j = 1; j < gpu_p2p_count; j++)
cudaDeviceCanAccessPeer(&can_access,i,j);

for (int j = 1; j < gpu_p2p_count; j++)
for (int i = 0; i < gpu_p2p_count-1; i++)
cudaDeviceCanAccessPeer(&can_access,j,i);

Here’s a value of gpu_p2p_count in first column and list of checked combination in same line:
2 01 10
3 01 02 11 12 10 11 20 21
4 01 02 03 11 12 13 21 22 23 10 11 12 20 21 22 30 31 32

We have correct result with two GPUs.
But what is the reason to check RDMA with the same GPU?
What is the reason to check access twice?