I encountered a tricky bug these days.
The bug comes when pytorch ungrade from 1.7 to 1.10.
The bug can be reproduced when calling
torch.mode(torch.from_numpy(np.array([1,111,1,0,1,0,1444,1,10,4]*600)).to(torch.int64).cuda(0), 0)
then do thrust::sort will raise RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal.
code used for testing:
int MAX_N = 10000;
thrust::host_vector h(MAX_N);
for (int i = 0; i < MAX_N; i++) {
h[i] = rand() % 998;
}
thrust::device_vector d = h;
thrust::sort(h.begin(), h.end());
thrust::sort(d.begin(), d.end());
sort with host_vector works always, but device_vector does not work.