Is there a MPI function that can send/receive or broadcast float4?
I am working with a number of float4 variables (position, velocity etc) and need to use MPI to send and receive blocks of data between MPI processes. MPI functions require the specification of a type, e.g. MPI_Float. My CUDA kernels are all set up to use float4s, but to transmit between processes I need these change these float4s into MPI_Float.
So on the host side I have the floats h_x,h_y,h_z but on the device side I have float4 d_x.
A typical communication would be
1.process i cudaMemcpy a float4 d_x to host variables float h_x,float h_y and float h_z
2.process i MPI_Bcast h_x,h_y and h_z (but this can only be in MPI_Float)
3.all processes cudaMemcpy their copies of h_x,h_y and h_z to float4 d_x
How can transfer d_x into h_x,h_y and h_z, and vice versa after the MPI_Bcast?