I’m implementing my Matlab code on CUDA and would like to ask some advice before starting. I basically have three (image) matrices of which the values need to be compared to one another pixel per pixel. There
are 3.2.1 = 6 possibilities for this, so my output matrix is of the same size as the input matrices but contains values of 1,2,3,4,5 or 6, depending on the relative values of the input matrices.
Second, the three input matrices need to be sorted according to pixel values, pixel per pixel. After this, both the sorted ‘stack’ of matrices and the output matrix containing the 6 possible sorting orders are
combined in a final calculation which gives me some sort of surface measurement.
As speed is of the essence, and I believe I can achieve significant speed-ups by removing the nested for loops that I now use in Matlab, I’m trying to go with CUDA. Any comments on how to approach this problem
in CUDA (generally, useful functions, memory tips) would be greatly appreciated.
Below you find my abbreviated Matlab source code,
= size(phase1); total = phase1 + phase2 + phase3; %Concatenate the 3 phase images stack = cat(3, phase1, phase2, phase3); %Sort phase images in depth, 6 possibilities per pixel. N = ones(sizey, sizex); for i = 1:sizey for j = 1:sizex if phase1(i,j) >= phase2(i,j) if phase2(i,j) >= phase3(i,j) N(i,j) = 1; else if phase1(i,j) >= phase3(i,j) N(i,j) = 6; else N(i,j) = 5; end end elseif phase1(i,j) <= phase2(i,j) if phase2(i,j) <= phase3(i,j) N(i,j) = 4; else if phase1(i,j) >= phase3(i,j) N(i,j) = 2; else N(i,j) = 3; end end end end end stack = sort(stack,3); Imin = stack(:,:,1); Imed = stack(:,:,2); Imax = stack(:,:,3); %Calculate intensity ratio r = (Imed - Imin)./(Imax - Imin); WrappedPhase = 2*round((N-1)/2) + power(-1, (N+1)).*r;