It is possible to use custom allocators with STL objects, placing them in managed memory - so in principle the data is accessible to CUDA. What prevents access to these containers from CUDA is that none of the existing STL implementations have member functions and operators that are declared as host device. The only way to access a vector that I am aware of is to pass the raw pointer to its first stored element to CUDA, together with the container size.
I was thinking that it might be very useful to have a subset of STL containers that work on the host, but also provide access to data from CUDA, mostly in a way that is optimized for read access.
Whenever writes to containers are made that require memory allocation on the devices, that is a problem - as memory allocated on the device heap is not unified memory and it cannot be accessed on the host. One way to allow limited write capabilities from CUDA would be to reserve() memory for e.g. a vector container on the host and to only allow CUDA to perform inserts/writes up to the reserved container size. Alternatively a preallocated pool of managed memory could be provided that CUDA can draw allocations from. Also the custom allocator of the STL container needs to be aware of this pooling system to allow this to be safely freed on the host side. Another issue with write access from CUDA threads is that of thread safety and locking of the data structure.
The subset of CUDA compatible STL compatible that I see the most need for access on the device, would be vector, map, set, list. Also I would find it reasonable to make the default memory allocator use managed memory.
Any toughts on this? Are there decent (not overly complicated) open source STL implementations that could relatively easily be extended to allow use in CUDA?