Can I call mpi functions in a kernel by thread?

Dear experts,
Could I use cuda-aware mpi directly in the cuda kernel, i.e., by a thread?
I tried the following but it didn’t work. I am not sure if it is because I didn’t use it correctly or it is not permitted to call mpi functions by a thread.

Could you please give any ideas?
Thank you very much!

no, you have to use MPI from host code. The distinction with CUDA-aware MPI is that the buffers can be device buffers

Thank you very much, Robert!
I also have the other question about mpi. I will submit another ticket.
Thanks, again!