Hi , I want to run my entire application on the GPU to avoid memory transfers from the host to the device. But my application requires lot of sequential programming as such i have to divide it many kernels. So now in my application i have many kernel launches but no memory transfers as the results are in the GPU. Will many kernel launches hamper the speed??