Description
I ran nano_llm.vision.video of the NanoLLM application using the docker image dystynv/nano_llm:r36.3.0.
After running the application for 5 hours and checking RSS, I found that there is a memory leak.
Could you tell me how to fix memory leaks?
Following the issue in the link below, which is a known memory leak problem, I have already added gc.collect.
Additionally, since I do not need the video output, I have deleted the code related to the video_output variable.
Please check video.py in Relevant Files for changes to /opt/NanoLLM/nano_llm/vision/video.py.
When not using TensorRT models, there were hardly any memory leaks.
Therefore, using the TensorRT model causes memory leaks.
Environment
TensorRT Version: 8.6.2
GPU Type: Jetson Orin
CUDA Version: 12.2
CUDNN Version: 8.9.4
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.12
PyTorch Version (if applicable): 2.2.0
Baremetal or Container (if container which image + tag): Container dustynv/nano_llm:r36.3.0
Relevant Files
The model used was Efficient-Large-Model/VILA1.5-3b.
It is automatically downloaded at runtime.
The following is a script for executing VLM.
run.sh.txt (284 Bytes)
(Due to restrictions on uploadable file extensions, “.txt” has been added to the end of the file name.)
I made the following changes to /opt/NanoLLM/nano_llm/vision/video.py.
video.py.txt (3.7 KB)
(Due to restrictions on uploadable file extensions, “.txt” has been added to the end of the file name.)
diff --git a/nano_llm/vision/video.py b/nano_llm/vision/video.py
index aa32878..336d304 100644
--- a/nano_llm/vision/video.py
+++ b/nano_llm/vision/video.py
@@ -24,6 +24,7 @@ from nano_llm.plugins import VideoSource, VideoOutput
from termcolor import cprint
from jetson_utils import cudaMemcpy, cudaToNumpy, cudaFont
+import gc
# parse args and set some defaults
parser = ArgParser(extras=ArgParser.Defaults + ['prompt', 'video_input', 'video_output'])
@@ -72,15 +73,11 @@ def on_video(image):
if last_text:
font_text = remove_special_tokens(last_text)
wrap_text(font, image, text=font_text, x=5, y=5, color=(120,215,21), background=font.Gray50)
- video_output(image)
video_source = VideoSource(**vars(args), cuda_stream=0)
video_source.add(on_video, threaded=False)
video_source.start()
-video_output = VideoOutput(**vars(args))
-video_output.start()
-
font = cudaFont()
# apply the prompts to each frame
@@ -123,8 +120,8 @@ while True:
if num_images >= args.max_images:
chat_history.reset()
+ gc.collect()
num_images = 0
if video_source.eos:
- video_output.stream.Close()
break
The following file is the script for obtaining RSS logs.
The RSS is checked every 5 seconds using the ps -aux command.
ps_5sec.bash.txt (197 Bytes)
(Due to restrictions on uploadable file extensions, “.txt” has been added to the end of the file name.)
The following log file is a log of the RSS when running VLM with the TensorRT model.
psinfo.log (1.2 MB)
Additionally, the following log file is a log of the RSS when running VLM without using the TensorRT model.
psinfo.log (1.2 MB)
Steps To Reproduce
When using a TensorRT model
Launch a Docker container
sudo docker run -itd --runtime=nvidia --device=/dev/video0:/dev/video0 -v ${PWD}:${PWD} -w ${PWD} -e PYTHONPATH=/opt/clip_trt:/opt/NanoLLM:/opt/NanoDB:/opt/faiss_lite dustynv/nano_llm:r36.3.0
sudo docker ps
sudo docker exec -it <container name> bash
Downloading models and saving TensorRT models
mkdir -p -m 777 /data/models/mlc/dist/models
mkdir -p -m 777 /data/models/clip
cp video.py /opt/NanoLLM/nano_llm/vision/video.py
bash ./run.sh
# Wait until inference begins.
# Exit the program with Ctrl + c.
Restart Jetson
exit
sudo docker stop <container name>
sudo reboot
Launch a Docker container
sudo docker start <container name>
sudo docker exec -it <container name> bash
Run the VLM program
nohup bash ./run.sh &
Run a program to get the memory log
exit
nohup bash ./ps_5sec.bash psinfo.log &
When not using a TensorRT model
Launch a Docker container
sudo docker run -itd --runtime=nvidia --device=/dev/video0:/dev/video0 -v ${PWD}:${PWD} -w ${PWD} -e PYTHONPATH=/opt/clip_trt:/opt/NanoLLM:/opt/NanoDB:/opt/faiss_lite dustynv/nano_llm:r36.3.0
sudo docker ps
sudo docker exec -it <container name> bash
Downloading models
(Don’t make the /data/models/clip directory.)
mkdir -p -m 777 /data/models/mlc/dist/models
cp video.py /opt/NanoLLM/nano_llm/vision/video.py
bash ./run.sh
# Wait until inference begins.
# Exit the program with Ctrl + c.
Restart Jetson
exit
sudo docker stop <container name>
sudo reboot
Launch a Docker container
sudo docker start <container name>
sudo docker exec -it <container name> bash
Run the VLM program
nohup bash ./run.sh &
Run a program to get the memory log
exit
nohup bash ./ps_5sec.bash psinfo.log &

