I have a few host functions and each function wraps a kernel. I want to calculate the run time of these functions. Is the following method OK?
cudaEventRecord( start,0 );
host_function_1;//each host function launches a kernel
cudaEventRecord( stop,0 );