loop + sorting advice

Hi all,

I’m implementing my Matlab code on CUDA and would like to ask some advice before starting. I basically have three (image) matrices of which the values need to be compared to one another pixel per pixel. There

are 3.2.1 = 6 possibilities for this, so my output matrix is of the same size as the input matrices but contains values of 1,2,3,4,5 or 6, depending on the relative values of the input matrices.

Second, the three input matrices need to be sorted according to pixel values, pixel per pixel. After this, both the sorted ‘stack’ of matrices and the output matrix containing the 6 possible sorting orders are

combined in a final calculation which gives me some sort of surface measurement.

As speed is of the essence, and I believe I can achieve significant speed-ups by removing the nested for loops that I now use in Matlab, I’m trying to go with CUDA. Any comments on how to approach this problem

in CUDA (generally, useful functions, memory tips) would be greatly appreciated.

Below you find my abbreviated Matlab source code,

 = size(phase1);

total = phase1 + phase2 + phase3;

%Concatenate the 3 phase images 

stack = cat(3, phase1, phase2, phase3);

%Sort phase images in depth, 6 possibilities per pixel.

N = ones(sizey, sizex);

for i = 1:sizey

    for j = 1:sizex

if phase1(i,j) >= phase2(i,j)

        if phase2(i,j) >= phase3(i,j)

                    N(i,j) = 1;

        else

            if phase1(i,j) >= phase3(i,j)

                    N(i,j) = 6;

            else 

                    N(i,j) = 5;

            end

        end

elseif phase1(i,j) <= phase2(i,j)   

        if phase2(i,j) <= phase3(i,j)

                    N(i,j) = 4;

            else

            if phase1(i,j) >= phase3(i,j)

                    N(i,j) = 2;

            else 

                    N(i,j) = 3;

            end

        end

    end

    end

end

stack = sort(stack,3);

Imin = stack(:,:,1);

Imed = stack(:,:,2);

Imax = stack(:,:,3);

%Calculate intensity ratio

r = (Imed - Imin)./(Imax - Imin); 

WrappedPhase = 2*round((N-1)/2) + power(-1, (N+1)).*r;