I am using a Matrix of [980*1660] = 1635100 cells and Array of Float of [1635100] on GPU global memory
long newIndex =0;
float neighbourMinVal = 1000000;
float currNeigbourValue;
long topLeft = currThreadID - size_X -1;// sizeX=980 goes to top row && -1 bcoz to got to left corner of window(size 8)
for(int k=0; k<9; k+=1)
{
newIndex = getNewIndex(topLeft, k,size_X);
if(!(newIndex == currThreadID || newIndex < 0 || newIndex >=size_Mat))
{
currNeigbourValue = planchonMatrix[newIndex];
if( currNeigbourValue < neighbourMinVal )
{
neighbourMinVal = currNeigbourValue;
}
}
Question is:
neighbourMinVal is not returning Minimum correct value after the loop. It is giving me 1000000 most of the time. But If i use Printf("neighbourMinVal ")or anything in the loop to print each value of iteration then it gives me correct minimum value.
I want to know why? and how can i solve it?
is this device code or host code?
and are you running a debug build or release build when referencing the results you get?
i can much understand why you would get what you get, if this is run as a release build, and the code is device code
a) neighbourMinVal is a local variable, and b) within a loop
the optimizer may very well only consider the last iteration of the loop then, as there seem to be little iteration dependency, and no intermediate storage to a ‘fixed’ location, like global/ shared memory
either that, or you may have a race; which is difficult to determine, as you only posted half of the if section/ for loop
When seeking help in debugging one’s code, it is always a good idea to post a complete, compilable and runnable example code. The code posted cannot be the complete kernel. For example, there is no global function shown; currThreadID and size_Mat do not seem to be defined anywhere. Your problem may also be rooted in the host portion of your code (e.g. failed memory allocation, invalid launch configuration), which is not shown.