Calling the Kernel twice without moving Data and cudaThreadExit()?

Hi everybody,
I am running 8 Threads, each use the only one C1060, I have, simultaneously. Each Thread call
cudaSetDevice( cutGetMaxGflopsDeviceId() );

Can I call a Kernel twice like that:

TestKernel_01<<< dimGrid, dimBlock >>>
(pBitMapDevice,
pBitMap_outDevice);
TestKernel_02<<< dimGrid, dimBlock >>>
(pBitMapDevice,
pBitMap_outDevice);

I mean, without moving Data,
using cudaThreadExit() and
cudaSetDevice( cutGetMaxGflopsDeviceId() ) between the 2 calls again?

Thanks for Your help.
Reinhold