Kernel execution is async?

if a kernel is executed asynchronously, how to estimate the running time?
i.e:

Timer();
Kernel1<<<grid, block>>>();
Timer();

Can we get the real execution time of kernel? Or we have to use cudaThreadSynchronize() after kernel execution?
Timer();
Kernel1<<<grid, block>>>();
cudaThreadSynchronize()
Timer();

Yes, the second way:

Otherwise you measure C method invokation time