Low fps when call CUDA kernel in DirectShow filter

There is a DirectShow transform filter on cpp. And inside it i just call the kernel:

__global__ void

kernel(int *leftView, int *rightView, int *destImage)

{

   // empty!

}

...

kernel<<< 1, 512 >>>(null, null, null);

...

I get low fps when i use my filter in graph for rendering video file.

But when i comment kernel call, file renders in real-time.

Why does the empty kernel call slow down filter? Is this because one GPU card is used for computing and video displaying?