Memory usage of thrust::stable_partition (with stencil)

Is thrust::stable_partition’s memory usage similar to thrust::sort_by_key (which allocates buffer space on the order of the input size)?

Or is it’s memory usage smaller (since the algo is somewhat simpler)?

CUDA 9.1, gcc 4.8, CentOS 7, device 6.1

I think you could probably use a thrust custom allocator to determine this.

also thrust is open source