I’m using a Jetson Orin NX 8GB w/ Jetpack 6.2.1 to run inferences with a tensorrt engine. The model usually runs without issue but last week I got an unexpected error:
1755884517773 2025-08-22T17:41:57.773Z Loading /models/full_models/RKF/trt_models/r36.4/RKF_cinched_20250626_100_ARS_duplicate.engine for TensorRT inference... 1755884517928 2025-08-22T17:41:57.928Z [08/22/2025-10:41:57] [TRT] [I] Loaded engine size: 13 MiB 1755884518072 2025-08-22T17:41:58.072Z [08/22/2025-10:41:58] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. 1755884518174 2025-08-22T17:41:58.174Z [08/22/2025-10:41:58] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 28 (MiB) 1755884518369 2025-08-22T17:41:58.369Z ERROR:root:Error predicting with TensorRT engine 1755884518370 2025-08-22T17:41:58.370Z Traceback (most recent call last): 1755884518370 2025-08-22T17:41:58.370Z File "/cameractrl/utils/yolo_detector_models.py", line 154, in pred 1755884518371 2025-08-22T17:41:58.371Z results = self.model.predict(img, device=self.device, imgsz=self.image_size, half=self.is_half, iou=self.iou_thres, verbose=False) 1755884518371 2025-08-22T17:41:58.371Z File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/model.py", line 558, in predict 1755884518371 2025-08-22T17:41:58.371Z return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream) 1755884518371 2025-08-22T17:41:58.371Z File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/predictor.py", line 175, in __call__ 1755884518371 2025-08-22T17:41:58.371Z return list(self.stream_inference(source, model, *args, **kwargs)) # merge list of Result into one 1755884518371 2025-08-22T17:41:58.371Z File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 38, in generator_context 1755884518371 2025-08-22T17:41:58.371Z response = gen.send(None) 1755884518372 2025-08-22T17:41:58.372Z File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/predictor.py", line 257, in stream_inference 1755884518372 2025-08-22T17:41:58.372Z im = self.preprocess(im0s) 1755884518372 2025-08-22T17:41:58.372Z File "/usr/local/lib/python3.10/dist-packages/ultralytics/engine/predictor.py", line 133, in preprocess 1755884518372 2025-08-22T17:41:58.372Z im = im.half() if self.model.fp16 else im.float() # uint8 to fp16/32 1755884518372 2025-08-22T17:41:58.372Z torch.AcceleratorError: CUDA error: device kernel image is invalid 1755884518372 2025-08-22T17:41:58.372Z Search for `cudaErrorInvalidKernelImage' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html`` for more information.
1755884518372 2025-08-22T17:41:58.372Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
1755884518373 2025-08-22T17:41:58.373Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
1755884518373 2025-08-22T17:41:58.373Z Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. ``
Unfortunately I have not been able to replicate this error. I tried to continually reload the model to run inferences in a loop but I could not get this error. The exact model is attached to this post.
Can I get some pointers on what might be causing this, how I can prevent this from happening in the future or how to replicate the error. Many thanks!