I figured out what the problem is.

The problem is loss of performance when indexing. Please see the program code… You can see that as soon as the indexing of GPU performance drops.

View if I just multiply the matrix of the speed of the GPU is 3 times higher (Elapsed time is 0.027648 seconds for CPU and Elapsed time is 0.011477 seconds for GPU).

But as soon as the indexing that GPU performance is 50 times less than that of CPU (Elapsed time is 0.002495 seconds for CPU and Elapsed time is 0.127313 seconds for GPU).

And the smaller the indexes, the problem is reduced. So the GPU doesn’t like indexing. Why is this happening?

M=rand(1000,500,‘double’);

N=rand(1000,500,‘double’);

tic

for i=1:500

M(:,i)=N(:,501-i).*N(:,i);

end

toc

%--------------

gpu=gpuDevice();

V=rand(1000,500,‘gpuArray’);

Y=rand(1000,500,‘gpuArray’);

wait(gpu)

tic

for i=1:500

Y(:,i)=V(:,501-i).*V(:,i);

end

wait(gpu)

toc

wait(gpu)

tic

for i=1:500

C=V(:,501-i).*V(:,i);

end

wait(gpu)

toc

A=V(:,1);

B=V(:,2);

wait(gpu)

tic

for i=1:500

D=A.*B;

end

wait(gpu)

toc

%--------------

tic

E=M’*N;

toc

wait(gpu)

tic

F=V’*Y;

wait(gpu)

toc

Elapsed time is 0.002495 seconds.

Elapsed time is 0.127313 seconds.

Elapsed time is 0.068272 seconds.

Elapsed time is 0.009520 seconds.

Elapsed time is 0.027648 seconds.

Elapsed time is 0.011477 seconds.

And how to solve the problem. The code was given for example. My code where I see the problem like this:

Y(:,26)=V1.*V2.*V12;

X=V1.*V3;

Y(:,27)=X.*V7;

Y(:,28)=X.*V8;

X=V1.*V6;

Y(:,29)=X.*V7;

Y(:,30)=X.*V8;

Y(:,31)=V1.*V7.*V11;

Y(:,32)=X8.*V1;

Y(:,33)=V1.*V11.*V12;

Y(:,34)=X2.*V7;

Y(:,35)=X2.*V9;

Y(:,36)=X2.*V11;

Y(:,37)=X2.*V12;

X=V2.*V3;

Y(:,38)=X.*V7;

Y(:,39)=X.*V12;

Y(:,40)=X.*V13;

X=V2.*V4;

Y(:,41)=X.*V7;

Y(:,42)=X.*V8;

X=V2.*V6;

Y(:,43)=X.*V8;

Y(:,44)=X.*V12;

X=V2.*V7;

Y(:,45)=X.*V7;

Y(:,46)=X.*V8;

Y(:,47)=X.*V9;

Y(:,48)=X.*V12;

X=V2.*V8;

Y(:,49)=X.*V8;

Y(:,50)=X.*V12;

Y(:,51)=X9.*V2;

X=V2.*V11;

Y(:,52)=X.*V11;

Y(:,53)=X.*V12;

X=V2.*V12;

Y(:,54)=X.*V12;

Y(:,55)=X.*V13;

Y(:,56)=X3.*V8;

Y(:,57)=X3.*V12;

Y(:,58)=X3.*V13;

X=V3.*V4;

Y(:,59)=X.*V8;

Y(:,60)=X.*V13;

Y(:,61)=V3.*V6.*V8;

X=V3.*V7;

Y(:,62)=X.*V7;

Y(:,63)=X.*V8;

Y(:,64)=X.*V9;

Y(:,65)=X.*V13;

X=V3.*V8;

Y(:,66)=X.*V8;

Y(:,67)=X.*V13;

Y(:,68)=Y(:,14).*V3;

X=V3.*V12;

Y(:,69)=X.*V12;

Y(:,70)=X.*V13;

X=V3.*V13;

Y(:,71)=X.*V13;

Y(:,72)=X.*V14;

Y(:,73)=V4.*V7.*V9;

Y(:,74)=V4.*V8.*V14;

X=V4.*V13;

Y(:,75)=X.*V13;

Y(:,76)=X.*V14;

Y(:,77)=X6.*V7;

Y(:,78)=X6.*V8;

Y(:,79)=X6.*V13;

X=V6.*V7;

Y(:,80)=X.*V7;

Y(:,81)=X.*V8;

Y(:,82)=X.*V12;

Y(:,83)=X.*V16;

Y(:,84)=X.*V17;

X=V6.*V8;

Y(:,85)=X.*V8;

Y(:,86)=X.*V11;

Y(:,87)=X.*V12;

Y(:,88)=X.*V13;

X=V6.*V11;

Y(:,89)=X.*V12;

Y(:,90)=X.*V13;

X=V6.*V12;

Y(:,91)=X.*V12;

Y(:,92)=X.*V16;

Y(:,93)=X13.*V6;

Y(:,94)=X7.*V8;

Y(:,95)=X7.*V9;

Y(:,96)=X7.*V11;

Y(:,97)=X7.*V12;

Y(:,98)=X7.*V14;

Y(:,99)=X7.*V16;

Y(:,100)=X7.*V17;

Y(:,101)=Y(:,1).*V8;

Y(:,102)=Y(:,1).*V9;

Y(:,103)=Y(:,1).*V17;

Y(:,104)=Y(:,1).*V18;