Tesla C2xxx vs GTX 4xx differences beside cores / memory / clock speed

I was wondering how many of the features from page 2 of the Tesla fact sheet (except those in topic description) were also shared by the GTX line?
Specifically:

  • NVIDIA PARALLEL DATACACHE
    Accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores.

  • NVIDIA GIGATHREAD ENGINE
    Maximizes the throughput by faster context switching that is 10X faster than previous architecture, concurrent kernel execution, and improved thread block scheduling.

  • ASYNCHRONOUS TRANSFER
    Turbocharges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed.

Source: http://www.nvidia.com/docs/IO/43395/NV_DS_…final_lores.pdf

You can assume all of them except the double precision performance, which is capped on the Geforce cards (and I think you will find that GTX470 and Telsa C2xxxx/M2xxxx have the same core count).

Full differences are at:

http://forums.nvidia.com/index.php?showtopic=165055

Sumit