Caffe BatchReindexLayer Test Fails with CUDA 9.1

My OS is CentOS 7.4.1708. I installed CUDA toolkits 9.1 with rpm (local). I try to compile caffe by g++ 7.2 (from gxx_linux-64 package) from Anaconda 3. When I run caffe tests, almost all tests pass but one BatchReindexLayer. Please see more details in this issue on Caffe GitHub:

In conclusion, if I turn on -G flag for nvcc, the BatchReindexLayer test will pass! Otherwise, the test fails.

I am afraid of that it’s an optimization bug of nvcc compiler 9.1.85. It could be a serious problem. Another person in the GitHub issue also confirm this problem.

I am not a good CUDA and caffe programmer and don’t know how to write a minimal program for reproducing this problem. Could nVIDIA deal with this problem?

If you want to file a bug, the correct place to do so is at developer.nvidia.com

Thanks, txbob.