NCCL support for complex data types

Please add NCCL support for complex data types. I am currently converting complex data I’m using in cuSOLVER and cuBLAS to real numbers to send using NCCL. However, this requires twice the amount of allocated device memory, and I am running out. I am using CUDA Fortran along with its interfaces to the libraries. I have also tried CUDA-aware MPI, but am having troubles with that. NCCL seems to work better. Thank you.