Introducing New Tesla Fermi Webinar Missed the first 15min

HI folks I wonder if anyone here attended this webinar.
I’ve read that the speaker answered question related to “Device to device memory transfer” and “up to 1 TB of memory”.
If you heard this part of the talk could you sum up what has been said on these two topics?

Thanks a lot

Found these 3 slides online:

  • 1 … please

Device to device memory transfer:

The speaker said GDDR5 is going to much faster than GDDR3 but he dos not have specific number yet.

I watched it…for the 1TB of memory thing, he (Sumit Gupta) said that the Fermi architecture can address that amount of memory; I assume that means they’re using 40-bit addressing. He also said that while the new Teslas will have 3GB or 6GB of memory (depending on the model), they are looking into even higher amounts for future devices.

FERMI has 2 DMA engines. so you DATA_IN and DATA_OUT of the GPU memory (from/to system RAM) SIMULTANEOUSLY along with KERNEL EXECUTIONS…

THis FEATURE is necessary because of their MULTIPLE_KERNEL execution strategy. This OPENS UP parallelism at a NEW LEVEL.

And, yeah 1TB of RAM. Cool. and UNIFIED POINTER support. Thus, at run-time, the memory generation unit can determine whether a pointer is shared OR global… That means, not entire 64-bits are used fo Global Memory.

A configurable L1 cache per SM - can be configured as 16K Shared Mem + 48K L1 OR Viceversa

Unified L2 support

  • 8x double precision peaking @ 50% of single precision speed

  • ECC support from registers to DRAM

and what not…

I am sure they will be pricing these for elephants…

It would be good if NV allows developers to submit jobs for FERMI and get results much like what intel does with


Long Live NVIDIA!

Thanks a lot Sarnath.
And did he told something about broadcasting (host to multiple device mem transfer) or device1 to device2 mem transfer?

That would be really great! I could optimze my programs for FERMI, since it will be the target device in future.

The PCI-Express bus supports something like this. He said Device 1 to Device 2 memory transfer was mostly a software thing, IIRC. E.g. it is on their roadmap to support it at some point. But that roadmap contains a lot of things they still want to add in the future…

That’s the only new thing for me. Everything else you mentioned is known for weeks or even months! Was there anything else?

Not in the context of FERMI. That was in the context of building a supercomputer with GPUs – He was dwelling on what kind of issues need to be sorted out… Its more hypothetical… Nothing was promised or no roadmap… Just hypothetical…

I dont think FERMI has that capability.

But well, I see a post above from gshi on that. but the answer looks completely irrelevant… Possible that I Missed something…

Well, nothing else that I remember… but there was talk on how TESLA cards handle memory coalescing better… (the reason why profiler always generates 0 un-coalesced access). That was good… TESLA figures out the memory segments accessed by a HALF-WARP, makes coalesced accesses to these set of segments and routes the data correctly to the thread… Therez a chance of extra memory being fetched…(32 bytes being the minimum) but lesser transactions…

How do I remove my post? Pls add this to the wishlist

ditto… I to want this post removal feature