cudaStreamSynchronize triggers 'an illegal memory access was encountered'

Hello everyone!

After adding a few lines into ring all_reduce kernel in my own nccl fork (here) and running this nccl example, I’ve seen the following error:

Failed: Cuda error ‘an illegal memory access was encountered’ is

I was wondering if anybody has any idea about this error.

How to reproduce the error:

First of all, lets build the nccl shared library using:

$ git clone
$ cd nccl
$ make CUDA_HOME=<path to cuda install>

Then copy and paste this example in a text file with an extension c or cu. Then compile the example using nvcc (make sure to add -lnccl -lcuda -lcudart as options). Then, execute the generated binary file.

ps: I increased run-time heap size like this , but it did not solve the problem.

This may be of interest: