i wanted to do a matrix mul on GPU before that plz cear some supercomputing terms t me , or am missing anything
i’ve a xeon E 5470 , dual socket processors , 4 cores on each one @ 3 Ghz
as per an anandtech article , E5472 fares 0.3 flops/cycle on GCC with O3 optimisation
thants makes 3 * 10^9 * 0.3 FLOPS per second , =>9 * 10^8 FLOPs
keeping this in mind , i run a simple matrix multiplication program on CPU
a 1000x1000 matrix :
this would require , 3 * 10^3 * 10^3 (for 3 matrices which r involved in multiplication)
and 2 Floting point instructions per operation as addition and mul is involved
this makes it 2310^6 = 6 * 10^6
total time it must’ve taken is 6*10^6 / 9 * 10^8 = 0.006 sec
but actually its taking 4.4 seconds on gcc with O3 optimisation
Am i missing something Plz Plz help soooooooooooooooonn
thanks in advance
i’ve attached my code