Could I use cuda-aware mpi directly in the cuda kernel, i.e., by a thread?
I tried the following but it didn’t work. I am not sure if it is because I didn’t use it correctly or it is not permitted to call mpi functions by a thread.
Could you please give any ideas?
Thank you very much!