CUDA Example with C++ vector list?

Does anyone have an example of using a C++ like vector list in kernel code? Yes, I understand libraries like std::vector and thrust aren’t supported in kernel code. Can someone point me to something similar that works in kernel code? I only require minimal vector list functionality, such as adding, removing, erasing, etc…

What do you want to achieve? A basic single-threaded vector can be as simply as 3 pointers which store begin of allocation, end of allocation, and end of occupied data.

I need to maintain a vector list that grows and shrinks over time. The data can’t be pre-allocated. I would rather ask first than re-invent the wheel. CUDA has been around for over a decade, so I would assume something like that must exist, ready to use.

And I have existing code that works with std::vector, so if I could find something that I could drop in without minimal changes, uses the same function calls, etc…, that would be ideal.

There is not drop-in replacement for std::vector for concurrent accesses in a kernel.

Yes, that’s apparent. I’m looking for something similar that supports the most minimal functionality. It doesn’t have to do everything, just the basics.

Memory can be dynamically allocated and freed in kernel code, correct? So, I don’t see what the big issue is in maintaining a vector list that can dynamically grow and shrink.

a framework of how a non-concurrent-access vector could be set up is here. It’s not guaranteed to be defect free.

For thread-concurrent access, I don’t have a similar example, but for a vector supporting concurrent push_back only, something like this is possible. That one requires knowing an upper bound for the maximum possible size.

Yes, I’m aware this does not directly address your request.

FWIW, AFAIK C++ std::vector is not fully thread safe, so imagining that the full thread concurrent case is a trivial or solved problem might not be the case…

It’s probably also worth noting that when using in-kernel allocators such as in-kernel new or malloc, the allocated regions cannot directly participate in a host API such as cudaMemcpy (this is documented in the programming guide.)

Thanks for the reply. I will check those examples out.

BTW, the vector list I need to maintain is more of temporary scratch type of data. It stays in the kernel, and doesn’t need to return to the host, so I hope it works out ok.