I’m currently working on a CFD simulation program. I have problems when calculating a difference. The boundaries are periodic, so I choose to use the modulo, and to make it faster, the resolution has to be 2^n.
The problem is the following (col=0;row=1m;KsiMax=512, inside kernel functions)
[codebox] (col+2)&(KsiMax-1)+row*KsiMax //->result 2 according to CUDA [/codebox]
[codebox] int Value=row*KsiMax;
Value+=(col+2)&(KsiMax-1); // ->result 514 according to CUDA[/codebox]
I have really no idea, why is 2+512 for the CUDA 2???