how to decide if a certain CUDA archi. is enough to execute project/code ?

Hi,

How do someone find beforehand whether CUDA architecture is enough to execute his project/program ?

I am looking to do topological sort on GPU. I think this tesla archi. have 4 GPU(128 core each). How do I decide whether it is sufficient to process big graph on it ?

If threads are not enough then can we assign computation of some chunks to one thread (instead of 1 chunk per thread). This sure sounds easy, but needs some work or maybe it can’t be done.

Any ideas on how to find beforehand whether big Graph can be executed on it ?