Does normal for loop in kernels takes more time than in host function

2011csgulfam163 · July 5, 2019, 11:39pm

Hi,
I am using kernels to process some char arrays simultaneously. for every thread I am using a for loop as I need to check if the current char array matches the target char array or not. I am doing it using for loop and comparing every element with respective element in target char array.
It is taking much more time than it takes on CPU.
Can anyone help me understand the issue here.
Any suggestion for improvement would be appreciated.
Thanks,

T.D.Qiu · July 9, 2019, 8:48am

Can you post some code?
Also make sure you understand memory coalescing:
https://devblogs.nvidia.com/how-access-global-memory-efficiently-cuda-c-kernels/