How goos is constant propagation in nvopencc? Example included

Consider the following class (simplified version of what I have for readability)

class Counter {

private:

  int cnt;

public:

  __device__ inline Counter() {reset();}

  __device__ inline void reset() {cnt=0;}

  __device__ inline void add(int val) {cnt+=val;}

  __device__ inline int get() {return cnt;}

};

I am planning to declare one instance C of the Counter at beginning of my kernel (at local/register memory space)

At various positions throughout the kernel I will call add() and get(), however with the following restrictions:

  • add() is called only with constants known at compile time

  • If add() is called within an if branch, a reset() is called after the branch end, before any other operation on C is performed

  • If add() is called within a loop, a reset() is called before the loop and at the end of the loop

The loops and branches, as well as the gap between branch end and reset() may be arbitrarly long.

My question is, with the above conditions, will nvopencc be able to optimise the object so that it would not use any memory at all and use explicit constants in the final code?

With a good host compiler I believe that would be so. I do not know however if the multithreded nature of CUDA kernels can cause some troubles and I don’t know if nvopencc does support that kind of optimalisation?