i’m not so good with cuda programming but i know that conditional statements should be avoided because they lead to some race conditions or some run-time errors well
here is my kernel that compares values of two arrays of bits and if the values are equal it increments a counter
[codebox]
global void oneMatch_kernel(bool *S, char *text, int *matches, int *count, int from, int to) {
if (from < 0) from = 0;
int idx = blockIdx.x*blockDim.x+threadIdx.x;
if (idx < from || idx >= (to-1)) return;
__syncthreads();
int size = sizeof(text)/sizeof(char);
for(int i=0; i<size; i++){
if(text[i]==S[idx])
matches[atomicAdd(&count[idx],1)];
}
}
[/codebox]
the first two if conditions are used that the values of the threads used do not exceed the size of the arrays or they’ll be wasteful, but the problem is in the second condition (the one inside the body of the loop) the condition is never true however there are some matches so if anyone can help please tell me how can i fix this
Conditional statements do not, in general, lead to race conditions or run time errors. The GPU can very efficiently handle conditionals.
The problem with your code is that you have __syncthreads() on a branch that not all threads take. __synthreads() must be called by all threads in a block.
when i try to remove this __syncthreads() statement also there is a problem and it doesn’t work do you see anything else wrong in the code that could lead to a problem??
in this statement im trying to get the length of the array text to loop around it and because there is no direct function to find the length of the array so im getting in by doing
It is not right place to ask, it is more c and c++ thing than cuda. Programming is about using functions. Need to know list of available functions. Btw, do you know char* type? Learn it first.
I would expect that sizeof(text) will always return 4 (sizeof(char*)), not the length of the array. I think what you want to do is pass the length as a parameter to the CUDA function, and use strlen(text) in your host code to find the length of the string (assuming it is as ASCII string).
If something this simple doesn’t work on the GPU I suggest writing it on the CPU. If it still doesn’t work then it rather suggests that there’s something wrong with your code. This is especially valid if you’re new with C as well as with CUDA.
Actually its working in C and yes I’m a bit new to CUDA but I’m doing my best with it, that is the code itself written in C that I’m trying to write in CUDA if you can see whats wrong with the code written in CUDA please help me
int search1match (OneMatch *O, register uchar *text,
int from, int to, int *matches)
{ register int n = from;
register bool *S = *O;
register int count = 0;
if (n < 0) n = 0;
while (true)
{ while (S[text[n++]]);
if (n > to) break;
matches[count++] = n;
}
return count;
Thanks for what you said it really helped but it never occured to me that sizeof(text)/sizeof(char) will differ from strlen(text) as it works properly with arrays of ints but anyway thanks