I’m trying to solve this issue for weeks and since I don’t understand how this thing works, I hoped someone here could help me.
Here is the source code I want to parallelize. Basically, what I want to do is quite simple : I want to remove the for loop using the k integer and replace it in SimulatPath using parallelization.
Currently, it is not parallelized yet since I can’t figure out how threads are scheduled.
Simple replacing my for loop using the k integer by a thread number seems not to work, neither a group number. I’ve checked on the NVidia samples too and still I can’t understand why it’s not working at all. In fact I misunderstand how threads are scheduled.
__global__ void SimulatePath(TOptionPlan plan, int k)
{
float result;
result = plan.p0 * exp(plan.m_A0 + plan.m_B0 * plan.d_Samples[k*plan.pathN]);
for(int i = 2 + k * plan.pathN; i <= plan.pathN + k * plan.pathN; i ++){
result = result * exp( plan.m_A + plan.m_B * plan.d_Samples[(i-1)] );
plan.d_Buffer[k].Expected = result;
}
}
for(int k = 0; k < plan.optionCount; k++)
{
SimulatePath<<<1,1,0>>>(plan, k);
}
I must miss a really important point but I can’t figure out what it is. It should be easy :(
Thanks a lot for your help. I try to solve this thing for weeks…
Thanks a lot :) It’s working well for optionCount <= 512.
That was my mistake then… I didn’t pay attention to the fact that only 512 threads can be launched simultaneously on the card.
As for me, I need to launch much more options than 512 (arround 10 000 or 100 000). :( What shall I do? Use a global memory array to store intermediate results?
I didn’t seem that optimized…
Thanks a lot :) It’s working well for optionCount <= 512.
That was my mistake then… I didn’t pay attention to the fact that only 512 threads can be launched simultaneously on the card.
As for me, I need to launch much more options than 512 (arround 10 000 or 100 000). :( What shall I do? Use a global memory array to store intermediate results?
I didn’t seem that optimized…