Seems thrust::sort bug with device_vector

I encountered a tricky bug these days.

The bug comes when pytorch ungrade from 1.7 to 1.10.

The bug can be reproduced when calling

torch.mode(torch.from_numpy(np.array([1,111,1,0,1,0,1444,1,10,4]*600)).to(torch.int64).cuda(0), 0)

then do thrust::sort will raise RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal.

code used for testing:
int MAX_N = 10000;
thrust::host_vector h(MAX_N);
for (int i = 0; i < MAX_N; i++) {
h[i] = rand() % 998;
thrust::device_vector d = h;
thrust::sort(h.begin(), h.end());
thrust::sort(d.begin(), d.end());

sort with host_vector works always, but device_vector does not work.