Can we access entire 1GB on new GTX's?

I had a question about the GTX 280 -
Can we access the entire 1GB global memory space on the card?
Or is it like the 9800 GX2 where you have the memory available but are limited in using it?

I understand that the 9800 had in fact 2 independent cards and hence the 512 MB per GPU allocation.
I just want to make sure that there is nothing like that in case of the 280 (or the 260 either) that I should be aware of before beginning to use them. I do not want to hit a wall half-way through - all because I chose the wrong hardware.

Clarifications from nvidia folks appreciated!


There are 2 devices supporting CUDA

Device 0: "GeForce GTX 280"

  Major revision number:                         1

  Minor revision number:                         3

  Total amount of global memory:                 1073020928 bytes

  Number of multiprocessors:                     30

  Number of cores:                               240

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       16384 bytes

  Total number of registers available per block: 16384

  Warp size:                                     32

  Maximum number of threads per block:           512

  Maximum sizes of each dimension of a block:    512 x 512 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1

  Maximum memory pitch:                          262144 bytes

  Texture alignment:                             256 bytes

  Clock rate:                                    1.30 GHz

  Concurrent copy and execution:                 Yes

As with all CUDA cards, you may not be able to allocate all of that 1 GiB, though. Usually, there is 50 MiB of unallocatable space: more if you run a compositing window manager.

Thanks MisterAnderson42 for the quick reply!

Any thing I need to know before I fire it up?
I am currently using the 8800 GTX and according to what I understand, I should be able to use the new 280 just like that but with larger data-sets?
The 512 MB on the 8800 limits me substantially. So am planning to make atleast some progress with the 280 that doubles the available space. I am planning to go to the Tesla C1060 maybe after a few months 9as budget allows).

Any comments/hints welcome.

I have successfully allocated >= 1000000000 bytes on a card that is not attached to a display (don’t remember the exact number). However you have to be aware that kernels and potentially execution parameters will need some of the memory to be stored on the card.

Yep. That is what is great about CUDA: any app scales to more processors without any problems or code modifications.

[DISCLAIMER] Replace ‘any app’ with ‘a normal app with enough blocks’ [/DISCLAIMER]

Because I have seen some code floating around on the forums that would not scale ;)

True. But only if one expects the performance to scale as well :)