can you help me measring the run time, memory time

i want to measure some times on my algorithm , can you help me how to do that
first i have to measure the time of the total processing time , that is an easy one,
but i have to measure the runtime of the kernel without the memory transfer
can you help me doing that , note that as i have knew that the kernel call is asyn.
so this code will >>NOT<< work :

// create and start timer
unsigned int timer = 0;

//kernel call
TestKernel <<<gd , bd>>> ();

// stop and destroy timer
printf(“Processing time: %f (ms) \n”, cutGetTimerValue(timer));

also i have to measure the transfer time only , can you provide me some information how to do that ?

async processing example