Recommended minimal hardware for trying CUDA programming under Win7

I’d like to get up to speed with CUDA programming for some DSP applications I’m working on. But I’m not running NVidia graphics cards. I plan to buy real hardware later if CUDA and I hit it off, but I’d like to get something running to explore the coding side while I research the higher power hardware requirements.

Any recommendations for very minimal hardware? I do have an inexpensive EVGA GeForce 210 card that I bought as a backup, but I think it’s only got 16 cores. I also spotted some older Tesla M1060 cards for about $40, but I’m not sure they’d be of any use.

What is the name of your “backup card”? I am assuming the “4 cores” specification actually refers to 4 SMs, you can compare that to the 14-16 SMs found in top of the line cards. Likely that card is already a decent starting point for getting up to speed with CUDA, given that you already own the card.

Generally, to use current CUDA versions you need GPUs with compute capability >= 2.0, so an old Tesla M1060 would not work. At this point, I would strongly suggest gettin a Maxwell-class GPU (compute capability >= 5.0), unless you need good double-precision performance. Maxwell-class GPUs start at around $100 in the US.

I’d stay with new generation of Maxwell - 750/750ti for about 65$ used on ebay. This will allow to use all CUDA features and performance will translate well to higher end Maxwell.
If you need double precision, go with Kepler (650, 650ti for 45-50$ on ebay).
You can do cuda with older hardware, but some features won’t be available and performance tradeoffs will be different.

What cards do you have now?

Sorry, the card is an EVGA GeForce 210, and it has 16 cores. I realized my mistake shortly after posting, and edited my post, but it looks like you had replied already (that was quick!)

I haven’t been doing graphics, and I’m not a game-player, so I just ran video from the CPU (i7-4790k).

I’ll forget about the older Tesla 1060 then (good thing I asked!). I’ll look into the Maxwell cards that both of you have recommended. I’ll have to read up re the double precision comment. I take it that the lower end Maxwells don’t do double precision. Lots of reading to do, but this is what I needed to get the card ordered. Thanks!

All GPUs with compute capability >= 2.0 support double precision in hardware. This means that all GPUs currently supported by CUDA, and all Maxwell-class GPUs in particular, have native DP support. However, the DP performance of Maxwell-class GPUs is poor (roughly in the 50-100 GFLOPS range), and your CPU may exceed the DP performance of these GPUs. As MrrVlad stated, for good double-precision performance with CUDA you would want a GPU with compute capability 3.5 or 3.7.

A GTX 750 Ti (sm_50) is the best bang for buck for introductory low-cost general SP CUDA development. It’s also unique in that it doesn’t require an extra power connector (unless you buy an exotic model).

If you can manage to get a $160 GTX 950 you gain the very latest sm_52 architecture and debugging features like instruction-level profiling and, if it matters to you, HEVC video encode/decode. The 950 requires a 6-pin power connector.

I second allanmac’s post. The 750, 950 and 960 all offer great value for the price. In dense linear algebra applications a 960 is nearly as fast as a GK110 based card like a K40 (and uses a lot less power). For sgemm I think it’s like 2.6 Tflops for a 960 vs 3.0 Tflops for a K40.

My convolution kernels on a 960 can actually beat cuDNN on a K40.

Great. I’ll go with the Maxwell series then. Thanks for the explanation about double-precision. I don’t think that will be a major drawback for getting up to speed.

I haven’t been able to find out very much about differences between the 750 and the 750 Ti. Both are available now with 2GB, so the only obvious bullet-point difference is the addition of 128 more CUDA cores for the Ti. Is that a compelling reason in itself, or are the further enhancements (like devel/debug features) that are not evident from specs? (I’ve called tech support for a couple graphics card mfgs, and they don’t seem to know any of this, so it’s great to have feedback here from developers)

Impressive, Scott! I’ll probably follow my usual trajectory, which is starting with a vague notion of throwing $20 at a little weekend side-project, and ending up spending hundreds on high-speed hardware and tech books. :-)
This is getting interesting!

If you’re just exploring cuda, the number of cores isn’t that important. I haven’t had a chance to play with instruction level profiling yet so not sure how compelling of a feature that is. I’m pretty good already at interpreting the output of nvprof and narrowing down where my bottlenecks are without that. But I suppose it could help.

Another question for anyone still tuned in:

I’ve heard conflicting reports about whether onboard Intel HD Graphics (direct from CPU) can be used as the primary video output, while using an NVidia card only for CUDA processing. From many comments here, I’ve been assuming that the NVidia card could be treated pretty much as a stand-alone (non-graphics) CUDA processor. But EVGA’s tech support indicates that cannot be done; that only cards from the same vendor can be run simultaneously.

Anyone happen to know about this? I thought that there were debugging features that were available only if the card was not being used for video. If it matters, the motherboard is Gigabyte with Z97 chipset.

You can definitely use headless CUDA cards.

I’m on a similar ASUS Z97-WS + i7-4790 and sometimes drive the monitor with the HD4600 IGP or one of the several CUDA cards in the workstation while debugging on a headless card.

I can’t speak for the Gigabyte BIOS but it does work on the ASUS board.

That’s encouraging, AllanMac. One of the EVGA techs warned me that it was not going to work, but I just spoke to someone there who sounded way more knowledgeable, and he confirmed that there would be no problem. Evidently some ‘business class’ (dumbed down) BIOS’s auto-disable other video chips when a stand-alone card is detected. Maybe that’s what caused the confusion.

As I’m studying, I’m getting more interested in this, so I’ll probably spend a bit more to get the 960. Comments from you and Scott sound like it will be worthwhile. Looking at NewEgg, I get the impression that the 960 has replaced the 950.

Now I’ve got to determine whether I need 4GB of RAM. I’m thinking that 2GB will do for development, but that may bear some more research.

BTW, I’m impressed with EVGA. Their techs are there 24-7, and I’ve spoken to a couple who really know this stuff.

Good point above about the amount of RAM. I was going to opt for 2GB, but the 960 is available with 4GB. Is that worthwhile for development? Probably doing audio DSP and image analysis on still frames.

BTW, it appears that I am indeed following my normal trajectory, as the 960 could cost about 10 times as much as the original experimental toy board that sparked my interest. But it looks like the 960 could be a serious development tool (if you guys are using it), so I may not need to step up from that for a long while. Hence my previous question about whether 4GB of RAM would be worthwhile.

Before this thread falls off the edge, I just wanted to thank those who took the time to provide valuable info. Probably saved me from an unwise purchase, and I’ve been able to get hardware ordered early.

When in doubt, always get the bigger memory version. You never know what you will use the card for in the future.

Some good insight into a very helpful topic, but in two years things have changed.

I’ve found the GeForce GTX 1050 Ti to show the most promise, having Compute Capability of 6.1 and 4 GB. On the other hand, on chat I was told that the Quadro K2000 would be better, even though it has Compute Capability of 3.0 and only 2 GB, mostly because it was designed for programming as opposed to games.

(I found the Compute Capability numbers on and I know what they mean from

Any reason to believe that the K2000 would be worth more than the GTX 1050 Ti?


In 2013 i faced a similar question. Go cheap on hardware first to learn and to explore the CUDA ecosystem. My choice would be a GT 1030 (for example for evaluation work. Make sure to pick one with an active cooler/fan. If you already know you need more memory and you can spare the money, go for a used 960 or 970 with 4 GB memory or a new 1050ti 4 GB. I may add you want to have at least as much RAM, as you have GPU memory available.

My first CUDA card was a Quadro FX 570 in early 2013. So dont worry too much about the hardware, i could even learn a lot with this old 256 mb clunker launched in 2007. I switched horses to GT-610 after a few month and again to 960 4 GB (which i use today). A few month ago i bought a 1060 3 GB for my sister learning CUDA and chemistry (some gaming too^^). Great hardware, good value for the buck! But also way more expensive than the cheap GT-1030 as i paid ~200 Euros.