Contest Ideas Post Your Contest Ideas!

Everyone post up your contest ideas here…I missed the last contest, and I would like to get in the next one…only it seems there hasn’t been another one yet. Post your best contest ideas here, and maybe we can get the next challenge going!

My Ideas:

  • cuLAPACK – LAPACK wrapper for cuBLAS (the standard LAPACK library is mostly implemented through calls to BLAS, so this shouldn’t be too difficult)

  • Object-oriented wrappers for CUDA – Provide object-oriented wrappers for the cuda libraries so that “less-advanced” developers can make use of the library without having to get into heavy C coding. Popular choices might be Python, C++, C#/VB.NET, Java.

  • x264 Encoder/Decoder – Even though nVidia wants to shy away from codec coding contests (as they target only a small audience of developers), building an h264 encoder/decoder would be a great help to many people that transcode videos (to their iPods, for example).

  • Game Engine – Implement an AI engine for a game (e.g Othello, Chess, Checkers, Poker, FPS Bots, etc.) that uses the multi-processing capabilities of CUDA. Developers will need to submit a working game for this contest; any type of game may be submitted, as long as it makes use of the CUDA engine.

  • Database Processing – Implement an extension to a database that will use CUDA to do parallel processing (i.e. searches, aggregations, etc.). An alternative idea is to use SQLite (written in C, and in the public domain), keeping the code that reads/writes files to disk, but otherwise implement the rest of the functionality in CUDA-based code (so that the database essentially runs from the GPU).

  • CIL Translator – Implement a “translator” (JIT compiler?) for CUDA so that kernels can be written in .NET languages (C#, VB.NET, F#, etc.). This might done by completing a partial port of the Mono library. If this could be implemented, then a follow-up project may be to implement the PFX (Parallel FX) library to let any .NET developer easily write concurrent/parallel programs.

  • Virus Scan – I read an article about how AMD/ATI and an anti-virus company used GPGPU calculations to speed up virus scanning…perhaps someone could implement the same technology on an nVidia card?

How about something simple?

Like the fastest Game of Life implementation on a 16384x16384 matrix… ;)


Steganography decoding - This can help Internet ISPs to scan picture attachments for coded information.

Yeah… these aren’t so much contests as “really big things that would cool if someone got around and did.”

I’m thinking a contest is a simple (but not too simple) algorithm that you optimize the crap out of, and compare who’s the 1337est optimizer. By definition, simply getting it to work should be the easy part.

Why not have nVidia pick a game and set up an internet game server…then we could write AIs using CUDA to compete and see who wins at the end of the month…

How about a simple sort routine?
You’re fed 5M floats in device memory, you have to write them into a new array in sorted order. There could be many “challenge” arrays (some random, some almost-sorted, some with weird clumps). Total time to sort them all correctly wins.

This is an “easy” problem to understand and lets people get started quickly. It’s straightforward and not hard to start coding reasonable solutions.

This is a “hard” problem in that there’s many many strategies and the winner will likely be a very clever coder who knows how to use local memory caches, effective partitioning, and knowledge of sort algorithms.
The gap between winner and second place will likely be very close.

This is a USEFUL problem in that the winning code would be something other CUDA users or implementers could learn from or use as a library.

The negative is that it’s not an impressive application, it can’t be used by Nvidia marketing as a cool web graphic with a bold “100x speedup!!!” logo over it.

You’d be surprised how many ways there are to optimize a cellular automaton like Game of Life.

There are some variants using bit packed representations of the matrix, applying SSE streaming operations for computing the next states.

On a GPU you’re likely more memory limited than compute limited, hence efficient packing of the cell state would be a requirement to win the challenge.

There are also some interesting approaches using giant Hash tables for transitioning whole blocks of cells to future states. These are the fastest implementations known to date - but would they map well to a GPU? I doubt it.


Yeah, game of life sounds cool. I’d love to do it, especially since I’ve never taken a look at how the game of life works and it’s great to explore something new.

Man, i’m really itching to do a contest. I’m a bit apprehensive because I know vvolkov will be a formidable opponent, but I’m willing to take even him on ;-)

I am a student of IIT Delhi. I want to participate in this contest. Could you please explain “Database processing on CUDA” problem elaborately?

Hi all

I have some cuda programming experience. I made a fire simulation that used Johannes Schmid’s sequential implementation and created a parallel implementation, where I parallelized the computations (I am not strong in graphics, so I left that displaying thing to the CPU). I want to continue working on CUDA in my major project. I am interested in x264 encoder/decoder and in that object-oriented wrappers thing. give me some explanation. Also, I am ready to work on something new. I have 1 year that I’ll dedicate to working on a major project so I can do a lot. The only problem is that I don’t have any good ideas. If someone could give me references to previous work that I could carry on or some new ideas like the ones given on this page, it would be very useful.

that virus scan link doesn’t work anymore.


Note: - tapas(the guy other than me who posted above) and i are both from iit delhi. sorry for reposting. actually i wanted to edit but ended up making another reply

Contest suggestion:

A CUDA based decoder for color JPEGs (without implementing any fancy options, such as lossless or progressive JPEG). The more functionality performed by CUDA, the better. The winner is the one who decodes a reference JPEG (say, a 4096x4096 pixel behemoth at high quality) the fastest on some standardized CUDA hardware. A bonus contest could involve some smart postprocessing for deblocking of low-bitrate JPEG and judge the visual quality (not processing speed).

The main challenge in this contest is to understand how parallelism might be applied to the decoding of huffman codes (if it is possible at all) To give a good incentive for putting the Huffman decoding on the GPU, one could standardize an Atom based platform (e.g. nVidia ION) for the judging of the contest - here the CPU is particularly weak. ;-)

Motivation: JPEG is reasonably uncomplicated, its basics are commonly taught in computer science and other engineering classes. There are simple JPEG decoders available that might serve as a starting point (and I don’t mean the IJG source code).

A jpeg decoder by Pierre Guerrier with modifications by Koen van Eikj:
(see djpeg_orig folder)
A simple JPEG to BMP converter with Windows GUI based on the above code:
The JPEG standard (for reference only):
U Stanford JPEG encoder+decoder V1.2.1 (mirror):…ics/JPEG-1.2.1/
A link to Wikipedia on JPEG:


I like this idea especially because it’s simple enough to make a quick simulator no matter what your skill level is, but the problem has layers of complexity you could dive into (like those hash caches, to detect regions in simple repeating states.)

From the LAME encoder contest, I think it’s clear that it has to be easy for people to get started and start playing, and Life’s simplicity makes it attractive.

Just noticed this: Intel Threading Challenge Contests

The prizes are all trivial, really just a total of $2000 split among 12 contests, but its interesting to see the flavor of the contests as well as the community that’s grown up around them. The different challenges are all pretty simple and fundamental, but not toys: searching, sorting, geometric queries, etc.
There’s been 6 contests so far and another 6 to come. All of the contests so far could conceivably be templates for GPU contests as well.

Another good CUDA challenge would be to build an open source (thus, free) Ogg Theora decoder. I just read a story about how Wikipedia is moving all of their video to this format, but it said that Microsoft/Google/Apple/et al. didn’t like it and so they weren’t going to be building support for it into their new products. Even if it’s not as good as H.264 (like Google claims), it would be useful to have a good encoder/decoder that was free, since Ogg codecs have free licenses as well (good for schools and so forth).

A card or a part of a card that treats the post processing so that one can use all the cool effects one can make with that.