Empty kernel significantly slows down program.


My problem is that my program slows down even using simplest kernel you can imagine. In my code there is a line

SimpleProcessing<<<dimGrid, dimBlock>>>();

dimGrid and dimBlock are 1 dimensional and function doesn’t do anything:

global void SimpleProcessingl()

Rest of the program is very time consuming (this is a face detection for image from camera). This kernel is executed for every frame asynchronous. Rest of the program doesn’t use GPU.

When this kernel execution is present, processing of one frame takes about 300 ms, but when kernel execution is removed it takes about 50 ms! This shouldn’t be happening, if I only run it on device. Anyone know where this slowdown comes from?? Please help


Are you timing the processing of the first frame? CUDA drivers take a while to set up the card and copy kernels and stuff over on the first call. After that first call, the overhead of calling an empty kernel should be in the microseconds range.

Yeah, that’s it. I’ve looped it few thousand times and it grew up to miliseconds. Thanks for solving my problem, i thought it doesn’t take so much time!