Contant Memory Latency Latencies of Constant Memory (64K) and Constant Cache (8K) in GT200

Hi,

I have been reading about CUDA and specially about GT200. However, it is very difficult to find documentation about specific details.

I read a paper written in Toronto (http://www.eecg.toronto.edu/~moshovos/CUDA08/arx/microbenchmark_report.pdf) that claims GT200 has a 2K, a 8K and a 32K caches. My first question is if the NVIDIA documentation supports this, and if so, where can I find it.

As far as I know, GT200 only has a cache of 8Kb per Multiprocessor, and a 64Kb space in Device memory to assign Constant Memory from the host.

Either way I want to know what is the latency in cycles of the 64Kb constant Memory and the latency in cycles of the 8Kb constant cache memory (Or the latency of the 2Kb, 8Kb and 32Kb if that is true).

I hope you can help me.

Nvidia’s documentation contains no information on the cache hierarchy, but the GT200 architecture has been subject to excellent reverse engineering by the group you cited.

Hi Tera,

Thank you very much about the remark. But now I am very curious about two things:

*Why Nvidia has no documentation about Cache latency?

*Would Nvidia agree with the Canadian Group that wrote this paper? Demystifying GPU Microarchitecture through Microbenchmarking | stuffedcow

Does anybody have an idea of how to answer these questions?