Are there requirements for choosing which parts of the codes are going to be parallelized?
if (code.can_be_parallelized())
code.parallelize()
I know, this isn’t the answer you are looking for. But every bit of code has to be evaluated on a case by case basis. There are no hard and fast rules about what can and cannot be parallelized. Even seemingly 100% serial operations like:
for (int i = 1; i < n; i++)
a[i] += a[i-1];
have very efficient parallel implementations (the example I gave is called a “scan”).
There is a general rule of thumb to answer the question “is it worth parallizing on the GPU?”. The answer to that one is, yesif you can run at least a ~5,000 independent threads. Of course, every rule has its exceptions: Say you have an algorithm with steps A B C D. A, B, and D nicely parallelize on the GPU with 10’s of thousands of threads. But C only runs a few hundred. It may still be worth putting C up on to the GPU despite that it might be slower than host code, just to avoid copying all the memory from the device to the host and then back again.
Yes the slow ones :D
but seriously, first you need to find the bottle necks of you current software (assuming you have something running) And then start thinking about how to parallelizing it.