Definitely a bubble sort if you have 6 values per thread.
If you had more than about 10, then a Shell sort could be faster.
Both bubble and Shell are well suited to per-thread sorts because their access pattern is fixed, so there’s no thread divergence, no pointer or array indirection, and fixed termination. All of those properties are GPU friendly. The only bad part of Shell and bubble is they’re O(n^2).
Bubble sort is stable, but Shell is not. This may or may not matter to your application.
I have my own app that uses 12 items per thread and bubble worked fine.