problems kernel returning wrong results

omar.elborae · May 5, 2010, 11:24am

Hi all i’m kind of new to cuda programming and i’m trying to implement a kernel for this cpp function:

[codebox]int search1match (OneMatch *O, register uchar *text,

	  int from, int to, int *matches)

{ register int n = from;

 register bool *S = *O;

 register int count = 0;

if (n < 0) n = 0;

 while (true)

   { while (S[text[n++]]);

     if (n > to) break;

 matches[count++] = n;

   }

 return count;

}[/codebox]

and this is my kernel:

[codebox]global void oneMatch_kernel(bool *S, char *text, int *matches, int *count, int from, int to) {

 if (from < 0) from = 1;

 int i = blockIdx.x*blockDim.x+threadIdx.x;

     if (i < from) return;

 while(S[text[i]]);

 if (i > to) return;

 matches[count[0]++] = i;

}[/codebox]

the function works by comparing characters and each time the characters match it increments the variable count but the values returned by the kernel are wrong so if anyone can help it would be great

thanks in advance

Lev · May 5, 2010, 11:42am

Try to debug your kernel and see what is going on.

omar.elborae · May 5, 2010, 11:51am

i’ve been debugging it for days and i can’t seem to find out whats going on

Lev · May 5, 2010, 12:05pm

Why count from 0 on last line? And while(S[text[i]]); is very strange. Looks like you have wrong assumptions about how gpu program works. It works in parallel, it is not a cycle.

omar.elborae · May 5, 2010, 12:14pm

count is what i want to return from the kernel but i couldn’t seem to use and int variable so i used an array and incremented the first element in it and i know i shouldn’t use this while loop but i want each thread to loop on the array the thing is it works perfectly in the serial code

Lev · May 5, 2010, 12:18pm

Did you check in debug mode that while loop performed only once? How do you debug program on gpu?

omar.elborae · May 5, 2010, 12:25pm

i tried to debug it using visual studio but i couldn’t seem to find a way to debug the kernel, if you know any way that i can debug it with it would be of great help

thnx

Lev · May 5, 2010, 12:27pm

use device emulation mode

omar.elborae · May 5, 2010, 12:34pm

how can i use that?

Lev · May 5, 2010, 12:40pm

Check documentation.
Programming manual. 3.2.9 Debugging using the Device Emulation Mode.

eyalhir74 · May 5, 2010, 12:45pm

[codebox]global void oneMatch_kernel(bool *S, char *text, int *matches, int *count, int from, int to) {
 if (from < 0) from = 1;

 int i = blockIdx.x*blockDim.x+threadIdx.x;

     if (i < from) return;

 while(S[text[i]]);

 if (i > to) return;

 matches[count[0]++] = i;
}[/codebox]

the function works by comparing characters and each time the characters match it increments the variable count but the values returned by the kernel are wrong so if anyone can help it would be great

You can’t just copy/paste serial code into CUDA :)

The count[0]++ means that all threads in all blocks will concurrently write to the same location in memory thus creating race-condition and obviously create

faulty results.

Either use Atomic functions or re-implement your algorithm to be multi-threaded safe. This really is not CUDA/GPU related but more of a multi-threaded issue.

eyal

omar.elborae · May 5, 2010, 12:54pm

can u just give me like a quick hint about atomic function like how can i use them?

eyalhir74 · May 5, 2010, 1:30pm

simpleAtomicIntrinsics in the SDK and the programming guide (atomicAdd/atomicSub) might be a good starting point.

Mind you atomics are slow and if you can partition your code to be thread-safe without atomics it will probably run faster.

omar.elborae · May 5, 2010, 1:53pm

ok i’ll check it and thanks for the help

Topic		Replies	Views
Really simple while loop issues CUDA Programming and Performance	4	3151	October 27, 2014
CUDA parallelization fail..? CUDA Programming and Performance	3	3368	June 8, 2008
Thread Synchronisation in parallel array write CUDA Programming and Performance	4	573	April 1, 2017
The atomic functions do not provide correct results CUDA Programming and Performance cuda	4	384	March 26, 2021
When this CUDA Kernel is executed, it appears to crash at a specific point in the code... CUDA Programming and Performance	0	808	December 10, 2013
how to output result in various length? CUDA Programming and Performance	5	4438	February 10, 2012
Using cuda for string matching CUDA Programming and Performance	4	3808	March 19, 2009
Losing CUDA calculatons CUDA Programming and Performance	5	2321	March 21, 2011
I can't understand kernel CUDA Programming and Performance	0	1980	March 10, 2007
Emulator works but G80 doesn't CUDA Programming and Performance	11	5432	July 3, 2007

problems kernel returning wrong results

Related topics