volatile keyword for class members

Hi,

in my cuda kernels I have many objects in global memory of a class say ‘Entity’. One of the class member
should be manipulated and read from threads of different blocks.
Simplified it looks like

class Entity
{
public:
    __inline__ __device__ void setData(float value)
    {
         atomicExch(&_data, value);
    }

    __inline__ __device__ float getData()
    {
        return _data;
    }

private:
    float _data;
    //... and many other attributes
};

I’ve observed that sometimes a thread gets a wrong value when calling getData().
I suspect that in spite of the atomic operation in setData() this value will be somehow cached.

What can I do here?

I tried to use volatile keyword here but it doesn’t work.
As a workaround I implemented the following getter:

__inline__ __device__ float getData()
    {
        return atomicAdd(&_data, 0);
    }

which worked but seems like a primitive hack…

Thanks for help!