I am doing a small project at school. I have done my code implementations in CUDA and did some performance measurements with real values, i.e. running the program with different number of threads, changing the size of the problem, changing both, etc.

The results of speedup look really nice. But then I was asked to do some theoretical analysis of the algorithms, specifically do a prediction of the time it will take my implementation to run as the number of threads increases. The problem was that I received some documents where they use “Timing diagrams” to try to model this, but this approach was for problems solved using MPI and I’m being asked to use exactly the same stuff to create a prediction model for CUDA. Is it possible to do this?

How can I do a prediction of how a given CUDA program (kernel) will run as I increase the number of threads used for computation? Say that my GPU has 240 cuda cores, now for sake of simplicity I am taking that if I launch a kernel with one thread that kernel will use only 1 core, if I launch 4 threads it’ll use 4 cores up to a limit of 240 in that case when I launch 500 threads there won’t be no more than 240 cores in use. It is highly likely that this is somehow wrong but I need something to start with the prediction.

What can you suggest should be the way to follow? I am really cracking my head because of this, I cannot came to a simple prediction model (Actually, I believe there’s no simple model for CUDA).