cudaStreamSynchronize triggers 'an illegal memory access was encountered'

Hello everyone!

After adding a few lines into ring all_reduce kernel in my own nccl fork (here) and running this nccl example, I’ve seen the following error:

Failed: Cuda error example.cu:122 ‘an illegal memory access was encountered’

example1.cu:122 is
CUDACHECK(cudaStreamSynchronize(s[i]));

I was wondering if anybody has any idea about this error.

How to reproduce the error:

First of all, lets build the nccl shared library using:

$ git clone https://github.com/Hamidreza-Ramezani/nccl.git
$ cd nccl
$ make src.build CUDA_HOME=<path to cuda install>

Then copy and paste this example in a text file with an extension c or cu. Then compile the example using nvcc (make sure to add -lnccl -lcuda -lcudart as options). Then, execute the generated binary file.

ps: I increased run-time heap size like this , but it did not solve the problem.

This may be of interest: