My problem is that my program slows down even using simplest kernel you can imagine. In my code there is a line
dimGrid and dimBlock are 1 dimensional and function doesn’t do anything:
global void SimpleProcessingl()
Rest of the program is very time consuming (this is a face detection for image from camera). This kernel is executed for every frame asynchronous. Rest of the program doesn’t use GPU.
When this kernel execution is present, processing of one frame takes about 300 ms, but when kernel execution is removed it takes about 50 ms! This shouldn’t be happening, if I only run it on device. Anyone know where this slowdown comes from?? Please help