NCCL failure common.cu:908 'unhandled cuda error'.

I have installed cuda 8.0.61, cudnn 5.1.10 and nccl 2.1.15 on Ubuntu 14.04. I have successfully verified cuda and cudnn using official examples.
However, I run into errors using nccl-tests
$ ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 4
NCCL failure common.cu:908 ‘unhandled cuda error’.

I have tried to install nccl2 locally and using network repo. But both ways failed.

Anyone can help?

We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/

We are moving active deep learning threads to the new section.

URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.

-Siddharth