smart pointers and stl implementations in cuda

I would like to ask if other Cuda developers are using c++ techniques like smart pointers within their kernels?

I bumped into this project:

which seems to have a good implementation of smart pointers along with a working stl library for the device.

Before investing too much time, I am interested to know if anyone has used this library or something similar. Thrust is useful but it is really a host side library for cuda. I’ve spent quite sometime implementing bits of the stl library for the device side, but it is pretty hard work to get all the features I need.

Does nvidia plan to release a proper implementation of stl for the device? I think this would be something that could boost productivity of cuda developers massively.

Does the concept of smart pointers make sense for the device? On the device side if you allocate a class on the heap with cudaMalloc it will persist across kernel launches, so I don’t see when the destructor of the smart point would actually get called. I like the idea of being able to utilize modern programming practices (RAII) but it’s not clear to me this will actually work.

Anyway any thoughts are appreciated.