Hi,
I am currently playing with NCCL library.
I have a previous project using cuda driver api .
I found that NCCL seems only workinng with runtime api.
So, the question is that it is possible to get the device pointer cudeviceptr after using cudaMalloc?
I konw that I can use cudaMallocHost to get the device pointer but the performance of kernel is very bad compared to cudaMalloc.
So does nccl directly support cuda driver api?
Thanks in advance.