How to define %envreg special registers ?

The documentation link to a Driver documentation but i’m unable to find any informations about that.

Description
A set of 32 pre-defined read-only registers used to capture execution environment of PTX program outside of PTX virtual machine. These registers are initialized by the driver prior to kernel launch and can contain cta-wide or grid-wide values.

see http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#special-registers-envreg-32 Section 9.20

Goal: Reduce cmem usage… All these read-only regs are empty (set to 0)

uint32_t r[32];
asm("mov.b32  %0, %envreg0;":  "=r"(r[ 0]));
asm("mov.b32  %0, %envreg1;":  "=r"(r[ 1]));
asm("mov.b32  %0, %envreg2;":  "=r"(r[ 2]));
asm("mov.b32  %0, %envreg3;":  "=r"(r[ 3]));
asm("mov.b32  %0, %envreg4;":  "=r"(r[ 4]));
asm("mov.b32  %0, %envreg5;":  "=r"(r[ 5]));
asm("mov.b32  %0, %envreg6;":  "=r"(r[ 6]));
asm("mov.b32  %0, %envreg7;":  "=r"(r[ 7]));
asm("mov.b32  %0, %envreg8;":  "=r"(r[ 8]));
asm("mov.b32  %0, %envreg9;":  "=r"(r[ 9]));
asm("mov.b32  %0, %envreg10;": "=r"(r[10]));
asm("mov.b32  %0, %envreg11;": "=r"(r[11]));
asm("mov.b32  %0, %envreg12;": "=r"(r[12]));
asm("mov.b32  %0, %envreg13;": "=r"(r[13]));
asm("mov.b32  %0, %envreg14;": "=r"(r[14]));
asm("mov.b32  %0, %envreg15;": "=r"(r[15]));
asm("mov.b32  %0, %envreg16;": "=r"(r[16]));
asm("mov.b32  %0, %envreg17;": "=r"(r[17]));
asm("mov.b32  %0, %envreg18;": "=r"(r[18]));
asm("mov.b32  %0, %envreg19;": "=r"(r[19]));
asm("mov.b32  %0, %envreg20;": "=r"(r[20]));
asm("mov.b32  %0, %envreg21;": "=r"(r[21]));
asm("mov.b32  %0, %envreg22;": "=r"(r[22]));
asm("mov.b32  %0, %envreg23;": "=r"(r[23]));
asm("mov.b32  %0, %envreg24;": "=r"(r[24]));
asm("mov.b32  %0, %envreg25;": "=r"(r[25]));
asm("mov.b32  %0, %envreg26;": "=r"(r[26]));
asm("mov.b32  %0, %envreg27;": "=r"(r[27]));
asm("mov.b32  %0, %envreg28;": "=r"(r[28]));
asm("mov.b32  %0, %envreg29;": "=r"(r[29]));
asm("mov.b32  %0, %envreg30;": "=r"(r[30]));
asm("mov.b32  %0, %envreg31;": "=r"(r[31]));

for(int i=0; i<32; i++) {
   if (!tid && !bid) printf("r[%2d] = %08x\n", i, r[i]);
}

I tried to search for answers in llvm source code, these regs are read like the perf counters, but there should be a way to initialize them when building the kernel… or before the launch.

They are also referenced in nvvm and nvrtc libs… but i’ve no idea how i can set them.

Edit: While “wasting my time” searching for answers, i found this interesting blog article :
http://arrayfire.com/demystifying-ptx-code/

We can see the OpenCL compiler uses it. It would be nice to be able to do the same in cuda.