Hello everyone,
I am a new user of CUDA and I am working on a neural project in which I would need your help. To simplify the problem, let’s say I have one matrix A(100,500), one matrix B(100,500), the weight matrix W(100,100,100) and the resulting matrix C(100,100).
The Matlab code below represents one iteration i :
for k = 1:1:100
C = diag(A*W(:,:,k)*B')
end
I need to compute 10 Millions of iterations (i=1 to 10e6). The matrix W won’t change from one iteration to another whereas A and B will (A=fct(i), B=fct(i), C=fct(i)). These 10 Millions iterations will give me the possibility to calculate the objective function and I will update the weight matrix (W) thanks to a gradient descent algorithm.

First of all, I was wondering if GPU programming could speed up the problem as it is highly parallelizable. Indeed, if I use my CPU, I will spend hours/days just to evaluate one step of the objective function…

For example, I have a GeForce 740M with 384 cores and 2GB of memory. Would it be possible that each core execute the algorithm I just wrote above ? Or should I proceed a different way ?
Thank you very much for your help.