Hi all, I have a optical flow algorithm which is sequential. I recently parallelized it using OpenCL. When I ran the code on nvidia GPU, the speedup is promising. But when I ran it on AMD or Intel CPU, it’s worse than the sequential algorithm on CPU, can anyone give me an idea what caused this??
It really depends on the size of your problem, and how you measure the run time, but most likely it’s because of the overhead associated with OpenCL. If you wrote the code in the OpenCL C language, it has to be compiled to run on the CPU. And, there is also a lot of overhead just in calling the kernel. Mative kernels run faster.