CUDA on integrated graphics?

I’m seeing announcements of motherboards with DX10-capable integrated graphics coming next year from NVIDIA that are “Geforce 8”. NVIDIA engineers have made unequivocal commitments to supporting CUDA on all newly released discrete cards since the G80, but does this extend to integrated graphics?

Is this a clear yes or no at this stage, or a ‘wait and see’. If a yes, what differences would a IGP CUDA see? Would there be direct access to a portion of system memory?

Geoff.

I assume the answer is yes, G80+ all support CUDA, supporting compute programs in addition to geometry, pixel and vertex programs is built in.

PCI express cards already have access to (part of) system memory, like AGP, so I don’t think they will be any different in API.

If you are talking about this announcement:
http://www.betanews.com/article/NVidia_Bre…hips/1190737131

Then those are based on the GeForce 7 design, so no CUDA.

seibert: Nope, not that announcement:

http://www.dailytech.com/NVIDIA+Prepares+N…article9035.htm

wumpus: I’m not sure if there’s ever been promises that specifically extend to IGP. And while PCI-E’s access to system memory is something that is in common with IGP cards, CUDA doesn’t expose that feature any more. I imagine that there would be quite a bit of customization required to run CUDA in this new environment, but NVIDIA might well plan to do it.

What I like about programming on an 8600 is it’s a very regularly scaled down version of an 8800. Each of its components is the same 3-4x slower, so when I optimize for the 8600 i’m also optimizing for an 8800.

An integrated part, however, might not have this balance in which case I don’t see the point of a $50 savings (mobo with the IGP vs cheaper mobo + $100 8600).

As mentioned in the original post it all depends on weather or not you can exploit the fact that the GPU and the CPU share the same memory. If yes, this could give some algorithms and applications a significant advantage.

Alex; yes, I also think the straightforward scale-down (or up, perhaps in future) is one of the coolest things about CUDA. I agree that if you were looking to buy a mobo + Geforce 8 combination with an eye to running CUDA, then IGP probably isn’t the way to go.

However, IGP running CUDA is still interesting for several reasons:

  1. Increasing the potential user base for any CUDA application,

  2. Space-constrained or slot-constrained systems where there just isn’t room for discrete graphics,

  3. Direct access to main memory (although, like I discuss, that’s all very speculative that this will be put in place and just work)

  4. Hybrid SLI-type approaches (I know CUDA doesn’t work on SLI, it’s just an analogy). That is, if you have a CUDA application that responds to fluctuating load patterns, then you could keep it on IGP when load is low and keep the discrete GPU off except when load is high.

Geoff.

I think NVIDIA wants video games to use CUDA too, so enabling CUDA on integrated graphics is probably a priority.

However, direct access to memory is unlikely. Don’t the chipsets always hide that memory from the operating system? HyperMemory is driver-based, but I think shared memory might be hardware.

But in any case, do realize these IGPs won’t be any faster than CPUs. Even an 8500 GT has the same theoretical single-precision flops as a dual-core core 2 (28gflops, when you don’t count the imaginary post-filter MUL). An 8300 GS drops that down to 14. Where might an IGP fit into the spectrum?

I mean there might still be some benefits. The power-conservation example is a good one if implemented, and even if it’s no more powerful than a CPU it’s still better than nothing. But it’s not something to get excited over, unfortunately.

FLOPS is not the only (or even a correct) way to measure performance benefits. In addition to more raw processing power, CUDA-capable cards come with high memory bandwidth and extremely-light thread switching. A few apps show over 100x speedup on Tesla cards, when compared to C2D, most apps are at least 10 times faster (see CUDA/Tesla pages for references). So, even a very scaled down version of a G80 card is still likely to beat a CPU in those cases.

Paulius

Dude, come on… IGP means memory bandwidth is shared with the CPU. “Most apps at least 10x faster” also sounds grossly inaccurate, especially since you choose the word “application” not “inner loop.”