__constant__ on Fermi being read through global mem

To my surprise I’ve noticed that constant variables are being read through global memory, not through constant memory path on Fermi, i.e. when I compile with -arch=sm_20. Using options -arch=compute_13 -code=sm_20 fixes the problem and makes constant data fetched through constant mem path (and also improves the register usage almost twice).
Is this the intended behavior ??


Remember Fermi has a “unified” address space with some sort of TLB/MMU translation of addresses back into the appropriate address spaces. So what might start out in PTX as a global memory access might not end up that way.

I think it does not. 2 observations to support that:

  1. Disabling l1 cache slows down, and this is the only global memory read in the kernel

  2. ptxas has a separate instruction to read through constant cache (LDU - load uniform), that means that the constant memory is not the property of memory, but the instructions reading it.

If this unified addressing thing is what makes it behave this way, then I’m really disappointed, since there is no way to control that per function, only globally per module.


Yeah since I wrote that I went to have a look in the PTX 2.3 guide and I agree it is probably not correct. It seems that generic addressing and TLB translation for loads applies to global, shared or local spaces. Looking at the PTX of what I am working on right now, I see the compiler generating a lot of ldu instructions to load out of constant float and double arrays compiled for sm_20 in thread local matrix and vector operations. That begs the question about what might make the compiler generate different code for constant loads under different circumstances.

I’m using a constant structure:

__constant__ module_params_struct constparams

I wonder if that is a bug… will try an array of basic types instead to confirm.