Hello everyone!
After adding a few lines into ring all_reduce kernel in my own nccl fork (here) and running this nccl example, I’ve seen the following error:
Failed: Cuda error example.cu:122 ‘an illegal memory access was encountered’
example1.cu:122 is
CUDACHECK(cudaStreamSynchronize(s[i]));
I was wondering if anybody has any idea about this error.
How to reproduce the error:
First of all, lets build the nccl shared library using:
$ git clone https://github.com/Hamidreza-Ramezani/nccl.git
$ cd nccl
$ make src.build CUDA_HOME=<path to cuda install>
Then copy and paste this example in a text file with an extension c or cu. Then compile the example using nvcc (make sure to add -lnccl -lcuda -lcudart as options). Then, execute the generated binary file.
ps: I increased run-time heap size like this , but it did not solve the problem.