when a kernel starts,is it synthronised or asynchronised with the host end or won’t it return until it finished
is it available to use cpu timing function to measure the elapse time of a kernel excution just like this
queryperformancecount(start)
kernel<<<>>>
cudasynthtread();
queryperformancecount(end)