Using cuda for string matching

henry_kay · March 12, 2009, 9:22am

I have just started using CUDA for some two dimensional pattern matching experiments (Locating a two dimensional pattern in a two dimensional text). I run the code inside a kernel and the same code in the cpu to measure the performance difference. I get something like this with just one thread:

Kernel: 0.000027 seconds
Cpu: 0.003368 seconds

Is it normal to have such a performance increase just by using one thread? I get correct results, i am using __syncthreads(); in the end of the kernel and i am using cudaThreadSynchronize(); before checking the timer again.

I appreciate any feedback!

Sarnath · March 12, 2009, 10:04am

Check for errors in kernel launch… Usually a dumb kernel launch costs around 30 to 40 microsecs… and yours is 30 microsecs… so, looks like you r having a kernel launch error…

Use the “cudaGetLastError()” or the API close to the name…

henry_kay · March 19, 2009, 8:47am

The code was ok. I checked that the values were correct when i executed the code under the emulator by using printf’s and cudaGetLastError() reported no error.

As soon as i passed some value from the kernel to a shared variable after the __syncthreads() function, cutGetTimerValue started measuring the correct time:

Kernel: 0.729799 seconds (for 1 thread)

Cpu: 0.003386 seconds

Why does this behavior exists? Does the kernel have to always return some form of data back to the main program?

BenSt · March 19, 2009, 9:23am

I would guess you just made friends with the compiler, that optimizes the code out if you just throw the results away.

Why should something get calculated if its never ever used or written to memory? ;)

henry_kay · March 19, 2009, 11:03am

hmmm maybe you’re right, although it’s not the “correct” way to optimize the code. Which compile is cuda using? Gcc definitely doesn’t have this behavior.

Topic		Replies	Views
Cuda slow performance after process sleep/wait CUDA Programming and Performance	1	1246	June 14, 2022
need a help from employees or guys who know compiler well CUDA Programming and Performance	22	8616	December 18, 2008
How properly counting a performance/program time ? CUDA Programming and Performance	4	2568	August 28, 2007
Odd Slowdown Problem Same function slows down in loop CUDA Programming and Performance	3	9872	February 8, 2008
CUDA is slower than expected. Is something missing? CUDA Programming and Performance cuda , gpu , gpu-computing , parallel-computing	4	226	July 7, 2024
Inconsistent CUDA Kernel Execution Times in Sequential Execution CUDA Programming and Performance cuda	6	227	June 11, 2024
time measurement discrepancy timer, clock(), profiling CUDA Programming and Performance	4	6694	April 7, 2010
Strange Runtime behavior CUDA Programming and Performance	7	3103	December 18, 2009
Performance hit in CUDA program that calls kernel repeatedly within a for loop CUDA Programming and Performance	2	4003	January 6, 2012
help with first cuda program CUDA Programming and Performance	5	3879	June 24, 2009

Using cuda for string matching

Related topics