question about texture and constant caches on the gtx 200

How big are the texture (L1 and L2) and constant caches and what are the latencies and throughputs?

what about spacial caching with the constant cache? If I have a thread block where all threads have a loop and all thread access the same element in memory for each given iteration going over the entire block of memory in an ordered manner, doing this several times, is it better to perform this in constant, texture or shared memory?

i.e a very simplified idea

for (k = 0 ; k < K ; k++)
{
for (y = 0 ; y < Y ; y++)
{
for (x = 0 ; x < X ; x++)
{
out += mem[y]*…;
}
}
}

what would be the best memory type for mem, and if it’s shared mem, what would be second best, as I may be short on shared memory for this implementation.

Thanks

Is there a “cache line” for constant memory (and what size is it if there is) or are the reads one element at a time?