TRT: CUDA lazy loading is not enabled

WARN 	TRT: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars	trt_utils.cpp:254

export CUDA_MODULE_LOADING=LAZY
not work

Hi,

Could you please share with us more details like complete verbose logs, minimal issue repro model/script and the following environment details,

TensorRT Version :
GPU Type :
Nvidia Driver Version :
CUDA Version :
CUDNN Version :
Operating System + Version :
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Thank you.

Hi, I have a same erro when i run the demo(TensorRT/cpp) of YOLOX, and this is my environment details:
TensorRT Version : 8.5.1.7
GPU Type :RTX 3060
Nvidia Driver Version : 516.94
CUDA Version :v11.7
CUDNN Version :8.5.0.96
Operating System + Version : Windows11 22621.963
Python Version (if applicable) : not used
TensorFlow Version (if applicable) : not used
PyTorch Version (if applicable) : not used

Hi,

Looks like you’re using windows platform. Please add the environment variable correctly by following similar steps below.

Also please refer to the following doc for more details on Lazy loading.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/lazy-loading.html?highlight=environment%20variable#lazy-loading

Thank you.

I’m also running TensorRT 8.5, but on linux. I confirm not work.

CUDA_MODULE_LOADING=LAZY python3 onnx_to_tensorrt.py ../yolov7/runs/train/yolov7_master3/weights/best.onnx cones.engine

[12/26/2022-11:29:32] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in CUDA C++ Programming Guide
Loading ONNX file from path …/yolov7/runs/train/yolov7_master3/weights/best.onnx…
Beginning ONNX file parsing
[12/26/2022-11:29:32] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/26/2022-11:29:32] [TRT] [W] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
Completed parsing of ONNX file
Building an engine from file …/yolov7/runs/train/yolov7_master3/weights/best.onnx; this may take a while…
Completed creating Engine
[12/26/2022-11:31:58] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in CUDA C++ Programming Guide
[12/26/2022-11:31:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[12/26/2022-11:31:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[12/26/2022-11:31:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[12/26/2022-11:31:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
[12/26/2022-11:31:58] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

I bet you can easily reproduce this by running the yolo sample packed with TensorRT. Most people just ignore this message. I suggest suppressing it if it’s not adding value.