I am a bit new to CUDA and I want to calculate the execution time of my CUDA program. Basically I have to compare the performance of my program for calculating sum of two matrices of order 1000 x 1000, first on GPU and then I will use Device emulation mode to compare the performance on CPU. (Do you think Device emulation mode can be used as a bench mark ?)
So for this I need to know the execution time in each case. How do we find the execution time? what function and liberary is used and where exactly we put this function.
Thanks for spending your precious time!!