When trying to run Sanitizer for NCCL porting code, facing data race between the proxy realted threads, following code call path,
Thread-A
proxy.cc: ncclProxyProgress →
proxy.cc: progressOps →
net.cc: sendProxyProgress
sub->sendMhandle = resources->mhandles[args->protocol];
Thread-B
proxy.cc: ncclProxyService →
proxy.cc: proxyProgressAsync →
net.cc: sendProxyConnect@net.cc
net_ib.cc: ncclIbRegMr →
net_ib.cc: ncclIbRegMrDmaBuf
mhandle = (void) mhandleWrapper;
I would like to know that, is NCCL thread safe? If so, what may be the cause for the data race? If not, is there any way to avoid it?
Best Regards.