need suggestions for GPU programming ~ parallel processing

Hey guys, I’m working on a project and I need you guys to help me out with suggestions.
I need suggestions for programs that I could write that uses the GPU as processing power with focus on parallel processing.
Just a little more info on this project:

  • It’s a plugin based program.
  • Users are able to load a plugin, and the software will “run” the plugin.
  • The plugins must be programs that uses the GPU as processing unit and such programs must focus on parallel processing.
  • Some plugins can be focused on distributed computing, instead of on the GPU. (for instance, many peers could connect to solve the same problem)

Does it makes sense? I guess I’m not sure what I can / cannot do with the GPU.
From my research so far, it looks like I’ll be using CUDA or OpenCL for programming plugins that use the GPU, and I’ll be using OpenMPI for plugins that communicate with multiple peers.

This is part of my senior project in my CS, and my professor has approved it, so if you guys could help me to figure out what kinds of programs (plugins) I could write for the GPU (that focuses on parallel processing), I’d appreciate a lot.
I chose parallel processing as the focus of my project, knowing that GPUs have hundreds or even thousands of cores.
Any help or suggestions is appreciated, thanks!

I would assume it depends on the type of plugin. What type are you trying to write?

Well, I need to write plugins that show the use of parallel processing. There’s no specific type for it. I thought so far of programs like using the GPU for generating prime numbers, which can be implemented in parallel processing - http://stackoverflow.com/questions/12559788/parallel-algorithms-for-generating-prime-numbers-possibly-using-hadoops-map-re

I need to write at least 6 different plugins that shows parallel processing, so any suggestions is appreciated.

Okay, fair enough.

What do you like to program? Aside from something massively parallel, what is it that you like to make?

And what do you mean by “plugin”

In my view, the simplest to kill your bird would be to remove the “plugin” prerequisite, to take x projects/ programs you have written before, and re-implement them on a parallel platform
Then, to put a smile on your professor’s face, compare the resultant performance with that of the existing serial implementations
Should save you a significant amount of project research (get-to-know-the-project-time)

I’ve already got the plugin part figured out, I just need some examples of problems I could solve in parallel using the GPU. For instance, which programs could I write that can be solved using the GPU??

“Plugin” merely describes an attribute or extension, and hardly functionality; hence from a problem or purpose perspective it is mostly redundant, I would think

The gpu marketing and related brochures and so forth normally list the areas to which gpus have been applied - for instance, finance, science, graphics, etc; but this serves only as a broad classification, in your case

The cuda samples would yield more definitive cases, but are void of context or rich problem encapsulation (they contain ‘how to s’, not ‘why s’)

I suppose one could think of a special ‘set’ of problems that gpus really solve well; but at the same time I find it limiting
Chances are, if it can be solved by means of a serial implementation, it can be solved by means of a parallel implementation just as well
Hence, the suggestion to first search within your background for problems to solve with a gpu

Okay, well, one thing GPUs are good at solving is large sets of parallelizable data. They need to be large and have as few memory transfers as possible otherwise CPU implementations will probably be faster.

This is due to GPUs generally having lower clock speeds, smaller caches and the latency caused by memory transfer from the host to the device.

So you need a problem where doing 10,000 things at once slowly is faster than doing 4 things at once incredibly fast.

But it sounds to me like you’re trying to force the GPU into being incorporated instead of finding an organic need for it which makes me say, “Lol undergrads” even though I’m really just a B.Sc. in physics XD

“So you need a problem where doing 10,000 things at once slowly is faster than doing 4 things at once incredibly fast.”

Does this not merely argue the case for task parallelization? what about instance parallelization?

I’m sorry, I tried googling it but what is “instance parallelization”?

Instance parallelization would be, in addition to parallelizing the code or algorithm functionality, equally and further parallelizing the (now-parallelized) algorithm implementation

In simpler words, for applications that permit this, running far more instances than the cpu can only dream of, to the point that the cpu bursts out in tears, running to its mother weeping, shouting: fallacy! along the way, unable to shrug of the feeling of utter despair…

That sounds interesting. Do you have any particular examples? For example, in modifying a simple array, we can use a thread per array index. This would be task-level parallelization. How would we modify this even further?

Well, I can refer to embarrassingly parallel problems in general

most embarrassingly parallel problems would find gpus a welcome home, as, in my view, embarrassingly parallel under cpus is not the same as embarrassingly parallel under gpus
With the latter, you parallelize both at the sub-instance level, and at the instance level

In a number of cases, the task to be executed can be broken down in identical, but independent instances
You can then parallize the task, and its instances

A number of financial applications are embarrassingly parallel - you have to do the same task on a data set; you can then parallelize the task, and sub divide the data set between a number of instances

I know of a gpu implementation with 45 instances executing concurrently