Hi all,
I’m doing a program in CUDA and theres is one kind of operation that I have to do it a lot of times. Right now my implementation is working fine but I wonder if there exists some other way of doing this faster. That’s my case:
I have 2 char * (‘a’ and ‘b’) of size SCREEN_RES*numOfChars. For each pixel I have numOfChars of chars. I need to test each pixel of ‘a’ against all the pixels of ‘b’. By ‘Test’ I mean a logical operation to check if the chars of pixel in ‘a’ is a subgroup of ‘b’. Here’s an example for 1 pixel:
numOfChars=2
a={‘1’,‘2’}
b={‘3’,‘7’}
In binary:
a={‘00000001’,‘00000010’}
b={‘00000011’,‘00000111’}
This sould return true because ‘a’ is a subgroup of ‘b’. The way I’m doing this is:
__device__ bool isSubGroup(char * a, char * b, int aPos, int bPos)
{
bool equals=true;
uint i=0;
while(equals && i<NUM_OF_CHARS)
{
equals=a[aPos+i]==(a[aPos+i]&b[bPos+i]);
i++;
}
return equals;
}
Where char * a and b are both in global memory. I can’t use shared memory. (It’s a bit complicated to explain but is not an option)
I go thought all the chars anding all chars. My question is, is there some other way of doing this faster? For instance comparing all chars at once instead of using a while? Or maybe some other operations to do this faster than this one?
I hope is clear. Thank you very much.