I am learning abut latency and throughput and I have some questions:
Example: if we have an application in image processing and I want to use two options:
option A use only my CPU with 1 core and
option B use only my GPU with, let’s say, up to 10,000 cores.
I have a video of colored images and would like to convert the frames in the video to gray-scale.
If it takes the CPU core 1 microsecond (10^-6 sec) to convert one pixel of the image from RGB color to gray scale and
0.5 millisecond (0.5*10-3 sec) for the GPU kernel (using the Map operations we can pass 10,000 pixels to the GPU to process in parallel without affecting per-pixel performance, meaning we can run 10,000 threads in parallel in appropriate sized blocks and grids ),
then is the latency of the CPU 10,000 threads * 10-6 = 0.01 seconds and
the latency of the GPU 0.5 milliseconds(0.0005 seconds) for all 10000 threads since they can run in parallel and each pixel takes 0.5 milliseconds (or 0.0005 seconds)???
Am I missing something?
what about throughput?
Throughput of CPU for 10,000 threads is:
10,000 threads / 0.01 = 1,000,000
Throughput of GPU for 10,000 threads is:
10,000 threads /0.0005 seconds = 20,000,000
Please help…I don’t know if I am getting these concepts correctly!