Hardcoded constants vs. constant memory

I’m experimenting with an algorithm which uses a large set of
read-only data. I’m trying to hardcode my data into the kernel code,
hoping that it would give me better performance: since an instruction
is fetched only once but used (executed) by all the threads. Also,
this read-only data is accessed in sequential manner, perfect for the
instruction cache.

I found however that the hardcoded data actually ended up in constant
memory, even if I implemented it in PTX instead of C.

Is there such thing as hardcoded constant in G80? My suspicion is that
every constant is moved to constant memory during PTX–>cubin
compilation. Can I force the assembler to NOT optimize stuff out to
constant memory?

But even if hardcoded constants were available: would there be any
advantage using hardcoded data (instructions) over fetching data from
constant memory? The constant memory caching could be just as
effective as the supposed instruction caching, but if the instruction
cache is bigger than the constant cache I could gain some extra
performance (given that I can force hardcoding somehow, as opposed to
the assembler moving everything to constant cache).

if ptx immediates still end up in constant memory, then you can be pretty sure g80 doesn’t support immediates. It makes some sense too, since g80 is designed primarily for floating point and you can’t really fit 4-byte values inside instructions.