Two programs:
trtexec --loadEngine=fp16_fusion.trt --device=1 --useCudaGraph --iterations=1000000
trtexec --loadEngine=fp16_fusion.trt --device=1 --useCudaGraph --iterations=1000000
Result:
Starting inference
Cuda failure: an illegal memory access was encountered
[01/25/2024-18:06:34] [I] === Inference Options ===
[01/25/2024-18:06:34] [I] Iterations: 1000000
[01/25/2024-18:06:34] [I] Duration: 3s (+ 200ms warm up)
[01/25/2024-18:06:34] [I] Sleep time: 0ms
[01/25/2024-18:06:34] [I] Idle time: 0ms
[01/25/2024-18:06:34] [I] Streams: 1
[01/25/2024-18:06:34] [I] ExposeDMA: Disabled
[01/25/2024-18:06:34] [I] Data transfers: Enabled
[01/25/2024-18:06:34] [I] Spin-wait: Disabled
[01/25/2024-18:06:34] [I] Multithreading: Disabled
[01/25/2024-18:06:34] [I] CUDA Graph: Enabled
[01/25/2024-18:06:34] [I] Separate profiling: Disabled
[01/25/2024-18:06:34] [I] Time Deserialize: Disabled
[01/25/2024-18:06:34] [I] Time Refit: Disabled
[01/25/2024-18:06:34] [I] NVTX verbosity: 0
[01/25/2024-18:06:34] [I] Persistent Cache Ratio: 0
[01/25/2024-18:06:34] [I] Inputs:
[01/25/2024-18:06:34] [I] === Device Information ===
[01/25/2024-18:06:34] [I] Selected Device: NVIDIA GeForce RTX 4090
[01/25/2024-18:06:34] [I] Compute Capability: 8.9
[01/25/2024-18:06:34] [I] SMs: 128
[01/25/2024-18:06:34] [I] Compute Clock Rate: 2.52 GHz
[01/25/2024-18:06:34] [I] Device Global Memory: 24217 MiB
[01/25/2024-18:06:34] [I] Shared Memory per SM: 100 KiB
[01/25/2024-18:06:34] [I] Memory Bus Width: 384 bits (ECC disabled)
[01/25/2024-18:06:34] [I] Memory Clock Rate: 10.501 GHz
Enviromnent:
docker run -itd --name xxx --gpus all nvcr.io/nvidia/pytorch:23.09-py3
set CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1, it may repro the issue but can’t generate cuda dump. why?
with compute-sanitizer or cuda-gdb, i can’t repro the issue so far.
When you only run one trtexec, it can’t happen. but 2 and more can happen rarely the illegal memory access. So I suspect it may have memory conflicts between two process.
[01/26/2024-13:29:27] [I] Starting inference
[New Thread 0x7fffa1fff000 (LWP 3603)]
[New Thread 0x7fffa17fe000 (LWP 3604)]
Cuda failure: an illegal memory access was encountered
Thread 6 “trtexec” received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffa17fe000 (LWP 3604)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140735902900224) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140735902900224) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140735902900224) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140735902900224, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007fffb75ee476 in __GI_raise (sig=sig@entry=6) at …/sysdeps/posix/raise.c:26
#4 0x00007fffb75d47f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000000000041279e in sample::cudaCheck(cudaError, std::ostream&) [clone .part.174] [clone .constprop.741] ()
#6 0x000000000041a23e in void sample::(anonymous namespace)::inferenceExecutionnvinfer1::IExecutionContext(sample::InferenceOptions const&, sample::InferenceEnvironment&, sample::(anonymous namespace)::SyncStruct&, int, int, int, std::vector<sample::InferenceTrace, std::allocatorsample::InferenceTrace >&) ()
#7 0x0000000000411644 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(sample::InferenceOptions const&, sample::InferenceEnvironment&, sample::(anonymous namespace)::SyncStruct&, int, int, int, std::vector<sample::InferenceTrace, std::allocatorsample::InferenceTrace >&), std::reference_wrapper<sample::InferenceOptions const>, std::reference_wrappersample::InferenceEnvironment, std::reference_wrapper<sample::(anonymous namespace)::SyncStruct>, int, int, int, std::reference_wrapper<std::vector<sample::InferenceTrace, std::allocatorsample::InferenceTrace > > > > >::_M_run() ()
#8 0x000000000045287f in execute_native_thread_routine ()
#9 0x00007fffb7640ac3 in start_thread (arg=) at ./nptl/pthread_create.c:442
#10 0x00007fffb76d1814 in clone () at …/sysdeps/unix/sysv/linux/x86_64/clone.S:100
this is a cuda graph related issue, please help analyze, Thanks!