I have a doubt about calculation of CUDA performance measure. I see people saying 10X and 100x and so on.
How are they calculating those values?
In my case:
CUDA execution (includes memory allocation, memory copy from GPU to CPU) : 1.16 ms
CPU execution : 30.01 ms.
However, if we print the output (copied from GPU to CPU / output generated from CPU). This operation takes 50 ms.
How many fold improvement i achieved from CPU to GPU?
If i take only execution of program : ~30 fold
If i add the output priniting into the file then: ~ 1.5 fold.
Please tell me how to measure the performance?
Thanks for your time
PS: I would like to ask the same question to NVIDIA company developers.
Are you outputting to console? The terminal program etc. can have a BIG effect on output! Also, some shells “sync” your program to the console output, so if your terminal is displayin slowly, the program is halted meanwhile.
If you’re on Linux, try the mrxvt terminal. It is ridiculously fast. You could also redirect the output straight to a text file to see if that helps the speed. Or try piping to tee /dev/null which desynchronises processing and allows your program to continue faster.
Also, you can overlap the output processing with more CUDA processing, or “pipeline” the processing - while the main CPU outputs problem piece no. 1, CUDA starts processing problem piece no. 2, so more work gets done. Edit: What I mean, if you have many pieces to process, start a separate CPU thread to manage the output, if you can.
Youre accelerating the calculations, not the act of writing to a file. When i present results, i show what has actually been accelerated and what that acceleration factor is.
It all depends on what is asked of you. If you need to accelerate the piece of code, including writing to a file, by 30x, then you will have to accelerate the computing portion, which you have done, and then work on what is most likely taking the most time to do, writing to the file. For the latter, CUDA can do nothing for you.