How is SLI treated? (specifically, GeForce 295)

I’m thinking about building a new CUDA system, and I’ve been wondering how SLI is treated. A current thought is to buy a board with 4 PCI 2.0 slots and also buy 4 GeForce 295’s. Would I be able to use all 1920 or so stream processors on a single job? Can jobs span across multiple cards via sli, or are you limited to the cores on just one card. (for that matter, is the 295 treated as 1 card or 2?)

Thanks

I have a GTX295 card, GTX295 has two independent GPUs, each with 240 cores and 896 MB RAM.

for example, if you use matrixMul in SDK example, then you can only use one GPU of GTX295.

if you want to use 2 GPUs of GTX295, then you need 2 host threads, each binds to one GPU, so

you can not allocate 1GB memory n your GPU.

if you buy 4 GTX295, then you have 8 independent GPUs