Hi all,

I’m implementing my Matlab code on CUDA and would like to ask some advice before starting. I basically have three (image) matrices of which the values need to be compared to one another pixel per pixel. There

are 3.2.1 = 6 possibilities for this, so my output matrix is of the same size as the input matrices but contains values of 1,2,3,4,5 or 6, depending on the relative values of the input matrices.

Second, the three input matrices need to be sorted according to pixel values, pixel per pixel. After this, both the sorted ‘stack’ of matrices and the output matrix containing the 6 possible sorting orders are

combined in a final calculation which gives me some sort of surface measurement.

As speed is of the essence, and I believe I can achieve significant speed-ups by removing the nested for loops that I now use in Matlab, I’m trying to go with CUDA. Any comments on how to approach this problem

in CUDA (generally, useful functions, memory tips) would be greatly appreciated.

Below you find my abbreviated Matlab source code,

```
= size(phase1);
total = phase1 + phase2 + phase3;
%Concatenate the 3 phase images
stack = cat(3, phase1, phase2, phase3);
%Sort phase images in depth, 6 possibilities per pixel.
N = ones(sizey, sizex);
for i = 1:sizey
for j = 1:sizex
if phase1(i,j) >= phase2(i,j)
if phase2(i,j) >= phase3(i,j)
N(i,j) = 1;
else
if phase1(i,j) >= phase3(i,j)
N(i,j) = 6;
else
N(i,j) = 5;
end
end
elseif phase1(i,j) <= phase2(i,j)
if phase2(i,j) <= phase3(i,j)
N(i,j) = 4;
else
if phase1(i,j) >= phase3(i,j)
N(i,j) = 2;
else
N(i,j) = 3;
end
end
end
end
end
stack = sort(stack,3);
Imin = stack(:,:,1);
Imed = stack(:,:,2);
Imax = stack(:,:,3);
%Calculate intensity ratio
r = (Imed - Imin)./(Imax - Imin);
WrappedPhase = 2*round((N-1)/2) + power(-1, (N+1)).*r;
```