WARNING
LARGE AMOUNT OF TEXT
Here is the problem:
I have written a program to simulate behavior of a certain framework in certain situation.
This is how it looks like before and after calculations. (this was like 4hours of counting for my pc).
Printing is done using OpenGL and this is not a part of problem.
While writing program i didnt know about CUDA technology and now i would like to use it to make it faster.
Problem is- I have a really hard time understanding whole thing and am searching for ANY help- tips, code written,
simply anything.
Informations:
Framework looks like in the picture( one in beginning of simulation, second after some time).
I am putting force on some nodes in the top and and is standing on something with bottom middle part, so these are
not calculated.
Framework looks like this:
class framework
{
public:
vector<vector<nod>> v; //2-dimensional vector of nodes
float k; //elasticity coefficient, needed for calculations
float l; //standard distance between nodes, needed for calculations
kratownica(void){};
~kratownica(void){};
kratownica(vector<vector<nod>> vv, float kk, float ll)
{
v = vv;
k = kk;
l = ll;
}
};
Some math:
Each node is connected with some other nodes. Each of those nodes is trying to put calculated node in some distance from him.
In math words: force to neighbouring nodes is calculated as f = k(l - r), where k and l are from framework
and r is actual distance between nodes.
Considering this- every node is place in vector field implied by neighbouring nodes and in every position of this field
some force works on this node. So there is almost always possibility to find a better place, which means place with
weaker force.
Add some force from top in some top nodes and this is everything program does. I mean, finding better places for nodes
using newtons method.
Now for a code:
void move(framework &k, float f) //f is amount of force
{
for(int i = 0; i < k.v.size(); i++)
{
for(int j = 0; j < k.v.size(); j++)
{
if(i == 0 && j >= (k.v.size() +1)/3 && j <= (2 * k.v.size() + 1) / 3 - 1) //points with force added
step(k, i, j, f);
else if(i == k.v.size()-1 && j >= (k.v.size() +1)/3 && j <= (2 * k.v.size() + 1) / 3 - 1) //points not calculated
true;
else
step(k, i, j, 0); //rest
}
}
}
This is basically the main part that needs to get paralelled. I believe, as it is just 2 fors, it should be possible to do quite well.
What it does is of course finding better places for every node.
Now for function step, here comes the heavy math:
float step(framework &k, int i, int j, float f) //moving node at coordinates i, j in framework k
{
vector<float> r; //vector of distances between given node and neighbours (or, to be exact, reverse of it)
vector<float> dx; //vector of differences by x axis
vector<float> dy; //vector of differences by y axis, all 3 needed for calculations
float l = k.l; //standard distance between nodes
float kk = k.k; //elasticity coefficient
nod aa = k.v[i][j]; // node we work at,
vector<nod> v1 = addnodes1(k, i, j);
vector<nod> v2 = addnodes2(k, i, j); //these are 2 vectors. One holds nodes neighbouring by straight lines (at top, at bottom, at left, at right),
//second one holds neighbours at top-left, top-right etc.
//they are separated becouse the second ones are at sqrt(2)*l standard distance, which must be noted in calculations.
for(int i = 0; i < v1.size(); i++) //filling r, dx, dy with informations
{
r.push_back(Q_rsqrt((aa.x - v1[i].x)*(aa.x - v1[i].x) + (aa.y - v1[i].y)*(aa.y - v1[i].y)));
dx.push_back(aa.x - v1[i].x);
dy.push_back(aa.y - v1[i].y);
}
for(int i = 0; i < v2.size(); i++)
{
//r.push_back( odleglosc(aa, v2[i]));
r.push_back(Q_rsqrt((aa.x - v2[i].x)*(aa.x - v2[i].x) + (aa.y - v2[i].y)*(aa.y - v2[i].y)));
dx.push_back(aa.x - v2[i].x);
dy.push_back(aa.y - v2[i].y);
}
//BEGINNING of counting force in place nod is now.
float Fx = 0;
float Fy = 0;
for(int i = 0; i < v1.size(); i++)
{
Fx += (l * r[i] - 1) * dx[i];
Fy += (l * r[i] - 1) * dy[i];
}
for(int i = v1.size(); i < v1.size() + v2.size(); i++)
{
Fx += (1.4142 * l * r[i] - 1) * dx[i];
Fy += (1.4142 * l * r[i] - 1) * dy[i];
}
Fx *= -kk;
Fy *= -kk;
Fy += f; //adding force from outside
// END of calculating force in place nod is now
float norm = abs(Fx) + abs(Fy); //norm of this place
if(norm < 0.00001) //if node is in place good enough, no need for further calculations
return norm;
float a = 0;
float b = 0;
float c = 0;
/*
a, b and c is a matrix:
a b
b c
This matrix is a Jacobian of force in point nod is now- needed to use Newtons method
*/
for(int i = 0; i < v1.size(); i++) //calculations of jacobian.
{
a += 1 - l * r[i] + l * dx[i] * dx[i] * (r[i] * r[i] * r[i]);
b += l * dx[i] * dy[i] * (r[i] * r[i] * r[i]);
c += 1 - l * r[i] + l * dy[i] * dy[i] * (r[i] * r[i] * r[i]);
}
for(int i = v1.size(); i < v1.size() + v2.size() ; i++)
{
a += 1 - 1.4142 * l * r[i] + 1.4142 * l * dx[i] * dx[i] * (r[i] * r[i] * r[i]);
b += 1.4142 * l * dx[i] * dy[i] * (r[i] * r[i] * r[i]);
c += 1 - 1.4142 * l * r[i] + 1.4142 * l * dy[i] * dy[i] * (r[i] * r[i] * r[i]);
}
a *= kk;
b *= kk;
c *= kk;
float det = a*c - b*b;
float A1 = c/det;
float A2 = -b/det;
float A3 = a/det;
// A1, A2 i A3 is reversed JAcobian
k.v[i][j].x -= A1*Fx + A2*Fy;
k.v[i][j].y -= A2*Fx + A3*Fy; // putting node to better place using newtons method.
return norm; //returning old norm.
}
I guess this is everything needed.
Task here is to make calculations in “move” function parallel using CUDA and i am nicely asking for ANY help.
Code itself probably needs to be reorganised for this, but i dont really know how should i do it.
If anyone got throught this all, thanks in advance.
Also, sorry for possible bad english, its not my native.
Also, feel free to ask questions.