Using CUDA for Log Analysis

ssargent · May 6, 2010, 10:32pm

Hello, I’m a first time cuda user. I’ve run through a bunch of the samples from CUDA and also a number of the thrust examples as well. I’m ready to try my first program and I was hoping I could get some advice on implementation. What I want to create a http server log analysis tool.

Basic work flow would be to feed it a log file, it would parse each record and give a count of the number of successful request (status code 200)

I was thinking of using a thrust zip iterator that had 2 fields StatusCode int and LogRecord char*; The first use of cuda would be selecting some number of records out of the file and then sending them in to be parsed by the gpu. In this case I’m just looking for the status code, so the parsing would be fairly simple, walk the char array till you get to the N-th delimiter that signifies the status code. Take that code and update the zip iterator.

What I wasn’t sure about is; is this a reasonable use for CUDA, parsing out a log file entry in a thread on the gpu; They can get quite large (4500c) but normally are quite a bit more reasonably sized. Obviously i’ll need to carefully monitor how many entries i send in at once and make sure to batch it out.

Once i had an array of int status codes, i could use some sort of reduce function to remove the ones I don’t care about and then take a count. Overall a very simplistic program, but a decent first start.

Thank you in advance for fielding a newbie question!

Scott

SPWorley · May 7, 2010, 2:25am

The job isn’t a good match for a GPU. There’s almost no computation needed, so you’ll be dominated by memory transfer speeds over the PCIe bus. The CPU can stream the data through RAM in less time than it takes to send the data to the GPU.

And even that assumes the entire log is loaded into RAM. If it’s on disk, then disk IO will be your bottleneck even with a pretty slow single core CPU.