I updated my cuda and cudnn version to fit the gpu H20.
Environment:
- cuda version: from 11.4.2 to 12.2.2
- cudnn version: from 8.2.14 to 8.9.6.50
- gpu: from A100 to H20
And then my code start to fail:
terminate called after throwing an instance of 'thrust::system::system_error'
what(): after dispatching inclusive_scan kernel: cudaErrorInvalidResourceHandle: invalid resource handle
Received signal 6
#0 0x000002678216 base::debug::StackTrace::StackTrace()
#1 0x000002678729 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#2 0x7f2e7e99d980 <unknown>
#3 0x7f2e2238d018 __GI_raise
#4 0x7f2e22377527 __GI_abort
#5 0x7f2e22721919 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
#6 0x7f2e2272cf3a __cxxabiv1::__terminate()
#7 0x7f2e2272cfa5 std::terminate()
#8 0x7f2e2272d1f7 __cxa_throw
#9 0x00000139c0e2 thrust::cuda_cub::throw_on_error()
#10 0x00000158dbd3 thrust::system::detail::generic::shuffle_copy<>()
......
The code this error pointed to is here:
ThrustAllocator<cudaStream_t> thrust_allocator(
GPUGraphTable<KeyType>::s_cuda_allocators[context.gpu_id], stream);
thrust::random::default_random_engine engine(context.shuffle_seed[tensor_pair_idx]);
const auto& exec_policy = thrust::cuda::par(thrust_allocator).on(stream);
thrust::shuffle_copy(
exec_policy,
cnt_iter,
cnt_iter + context.total_row[tensor_pair_idx],
thrust::device_pointer_cast(d_random_row),
engine);
Then I printed the logs before it:
cudaStreamSynchronize(stream);
err = cudaGetLastError();
if (err != cudaSuccess) {
LOG(NOTICE) << "[GPU_ID: " << context.gpu_id << "]CUDA Error before shuffle_copy: " << cudaGetErrorString(err);
} else {
LOG(NOTICE) << "[GPU_ID: " << context.gpu_id << "]No error before shuffle_copy.";
}
// shuffle_copy calls
But it turns out that only gpu 0 got an error, all other gpu’s were normal.
I suspected a problem with stream
, so I ran another empty kernel before these codes:
// define
__global__ test_run_kernel(int gpu_id){
printf("[GPU_ID: %d] running in test_run_kernel!", gpu_id);
}
// ....
// before shuffle_copy
test_run_kernel<<<1, 1, 0, stream>>>(gpu_id);
// shuffle_copy calls
But it still failed because of cudaErrorInvalidResourceHandle
.
Then I tried to use global stream:
test_run_kernel<<<1, 1>>>(gpu_id);
And it worked. Based on these phenomena, I suspected the problem is in stream
. So I used cudaStreamQuery
to detect it, but I got cudaSuccess every time.
If any more information is needed, please let me know in the comments section.