hello, im doing project matrix multiplication in both Message passing and CUDA programming, what is best suitable way to calculate time and compare the runtime
between both the MPI and CUDA ? in the message passing ive used the MPI_Wtime() giving the run time in seconds, and in cuda used a ready function to calculate the GB/S which is the throughput time for the kernel run,
but is this correct way to compare ? because what i know that the GPU is good for its throughput and CPU for its latency, and there seem number of links calculating the CPU Gflops but nothing mentioning how or what best function to use,
what im aware on that the GPU comparison with MPI might not be fair, but there should be way to give indication on best way to compare between both - i mean programming function what to use ?
can you help advise in this matter ?
Much appreciated really thanks!