I have a basic question regarding speedup calculation.
I have a serial application designed to run on a CPU with a quad core.
The time taken by this serial application to execute on the Quad core CPU is t1.
Then, I parallelize this application using CUDA and run it on 512 GPU cores.
The time taken by this application to execute using 512 GPU cores is t2.
Now, I want to calculate the speed up of this CUDA parallelization.
The confusion I have is, which of the following options is correct/wrong.
a) We compare the timings for 1 core of CPU Vs. 1 core of GPU.
b) We compare the timings for four cores of CPU vs 512 cores of GPU. (In this case, the speed up would be: t2/t1)