Could cheap CUDA-enabled card handle this application?

In my application, one cpu thead could generate 50~200 threads parallel running on GPU.
If running this application on i7, 16 cpu theads could generate 800~3,200 threads parallel running on GPU.
GTX 280 has 30 multiprocessors, this leads to more than 30,000 active threads.
But my application need not so many threads running on GPU. Is cheap card, such as GT9600, GT9500,
enough handle this application ?

You’re likely asking the wrong questions.
In most GPU applications, you almost take scaling as part of your problem’s design and implementation.

If you’re starting your problem by saying “I only need 200 threads” then it’s likely you’re either not working on a GPU-friendly compute, or you’re not designing your algorithm to use the power available.

In fact rarely do you think about “thread count” on an app-wide sense. You think about “how many blocks does my app use” and usually you aim for 100-10000 or so. More is better for scalability across many sizes of GPU. Thread counts (within a block) are usually more of a tuning issue, not an app design issue.

Perhaps you could explain your app more, that would be more useful than just saying “I need 3200 threads.”