finding frequency of letters

biebo · April 13, 2010, 1:52pm

Hi all,

does any one know, about how to find frequecy of letters in text through CUDA efficiently ?

Thanks

–Biebo

CapJo · April 13, 2010, 3:10pm

You can try a reduction with the type of letter as bin. There is reduction example in the CUDA SDK.

If it’s your only task on the gpu, you will be better of with the cpu using OpenMP.

Lev · April 13, 2010, 3:16pm

It depends how long is your text.

Simon_Green · April 13, 2010, 3:32pm

This is a very similar problem to image histograms - see the histogram sample in the SDK.

CapJo · April 13, 2010, 3:47pm

I meant the histogram example instead of reduction … reduction has nothing to do with bins External Media

tera · April 14, 2010, 1:34am

I agree with CapJo. If this is the only task, it’s entirely bandwith limited. And on decent CPUs/mainboards, PCIe bandwith will be smaller than memory bandwith, so that even the data transfer to the graphics card (without any computation) is more expensive than doing everything on the CPU. I’d even guess that that just one thread would be enough to saturate the memory controller, so that not even OpenMP could improve the speed (unless on a NUMA system).

Not to mention the fact that the texts are likely to come from a harddisk…

marijnfs · April 14, 2010, 9:45am

I agree, just implement it on the cpu and see what is the limiting factor. Here is a c++ program that should do it. If the limit is on your cpu, you can start thinking about gpu:

[codebox]

include

using namespace std;

int main()

{

vector<size_t> count_vector(255);//or whatever

ifstream the_file(“the_file_name”);

while (true)

{

char c; //might have to use the unicode character, depends on your file

the_file >> c;

if (the_file.eof())

  break;

++count_vector[c];

}

[/codebox]

cbuchner1 · April 14, 2010, 11:05am

I couldn’t think of any more inefficient way of reading a file, than reading character by character and

checking for an EOF condition after every character read. You’ll spend most of your CPU cycles in the

ifstream object instead of histogramming.

marijnfs · April 14, 2010, 12:11pm

Naah shouldn’t matter as most posts suspect the harddisk is the limiting factor, and this example can proof that as it should be faster than the disk (if the ifstream buffer is large enough).

seibert · April 14, 2010, 7:38pm

I wouldn’t be so sure… On my computer, your code can only read 21 MB/sec from a file that is already cached in memory. A version which reads blocks of 4096 bytes at a time can read 570 MB/sec. Most hard drives should be able to do much better than 21 (but obviously won’t hit 570 unless you have a striped set of SSDs), so the character-by-character method really could be leaving performance on the table.

Topic		Replies	Views
Implement an optimal algorithm for counting characters in a large amount of text Or count the number CUDA Programming and Performance	2	2867	March 13, 2012
parallel calculate character frequency in a string CUDA Programming and Performance	4	3030	July 9, 2011
Counting characters whats the best strategy? CUDA Programming and Performance	3	3957	May 13, 2009
Looking for an example of simple word search in line of text CUDA Programming and Performance	9	3282	November 8, 2010
Managing Texture What is the best way to pass texture? CUDA Programming and Performance	0	2053	September 30, 2008
Using CUDA for Log Analysis CUDA Programming and Performance	1	1590	May 7, 2010
Massive "simple" computation with CUDA CUDA Programming and Performance	14	8672	December 7, 2009
Moving unsigned chars from a file to floats on a GPU a transfer performance strategy request CUDA Programming and Performance	12	2720	July 27, 2010
similar string search possible with CUDA? CUDA Programming and Performance	10	7129	December 12, 2008
Is GPU worth it? GPU currently too slow. CUDA Programming and Performance	16	6127	December 8, 2008

finding frequency of letters

Related topics