How to optimize the usage on jetson's RAM

sam11143 · March 24, 2024, 2:49am

Hello, I am using Python to develop my yolov8 application on jetson orin nano. I decode two H.264 camera stream and perform inference with .engine file. There two things we noticed:

The usage of RAM is extremely high, we can basically do nothings after we engaged the application.
The preprocessing and postprocessing are time consuming ( and I thinks the postprocessing is not consisted of tracking).

Have anyone tried yolov8 + BoT SORT with Python on any Jetson model? Please offer some advice on optimization on this.

Plus; I also try to develop a c++ application on Jetson. Jetson shares RAM with GPU, maybe I can do somethings like not using cv::cuda::GpuMat？

AastaLLL · March 25, 2024, 4:07am

Hi,

Could you share more info about your application?
Are you using the frameworks from ultralytics?

Thanks.

sam11143 · March 28, 2024, 8:50am

Hello, AastaLLL:

Yes, I am using the ultralytics api. Would you tell me what kinds of information you need. Really appreciate your attention, thanks.

sam11143 · March 30, 2024, 3:33am

Hi, Aasta, this is how our application works, basically:

decode h.264 rtsp camera stream by gstreamer. ( This model don’t have deepstream. I failed to install it. So I needs options except DS)
extract frame data by emitting pull_sample like this:

def new_buffer(sink, data):
    # pull sample
    sample = sink.emit("pull-sample")
    buf = sample.get_buffer()
    caps = sample.get_caps()
    print("frame size: "+str(caps.get_structure(0).get_value('width'))+", "+str( caps.get_structure(0).get_value('height')))
    # create buffer for a RBGA img
    arr = np.ndarray(
        (caps.get_structure(0).get_value('height'),
         caps.get_structure(0).get_value('width'),
         4),
        buffer=buf.extract_dup(0, buf.get_size()),
        dtype=np.uint8)
    
    # extract framdata into 3 channels
    data.img = arr[:,:,0:3]
    data.data_valid()
    data.frame_shape = data.img.shape
    if data.frame < 10000:
        data.frame = data.frame+1
    else:
        data.frame = 0
    return Gst.FlowReturn.OK

make a framedata duplicant from buffer for inference in case of missing.
call ultralytics YOLOv8 api like model.track()
get tracking results and print bbox on the framedata by cv2.rectangle().

I think we can directly call infer method with gstreamer buffer, since the gpu and cpu share this 8GB ram.

AastaLLL · April 11, 2024, 5:59am

Hi,

Sorry for the late update.

Ultralytics generally uses TensorRT for inference which depends on the CUDA library.
Loading CUDA takes memory (>600MB) since it needs to load all the modules to memory.

On CUDA 11.8, we introduce lazy module loading to allow users to only load the required CUDA module which can help to save the memory usage…
Please give it a try ( JetPack 6 with CUDA12).

Moreover, it looks like your preprocess and postprocess is CPU based.
If so, it’s expected to take time since memory transfer (CPU ↔ GPU) is required.

Thanks.

system · May 8, 2024, 2:08am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Orin Nx GPU not Consumed while running YOLOv8 modelo Jetson Orin NX cuda , ubuntu	13	236	October 25, 2024
Jetson Orin Nano's RAM keeps getting full, the board crashes Jetson Orin Nano yolo	12	280	March 12, 2025
YOLOv8 on J1020 (Nano) using TensorRT model Jetson Nano tensorrt , yolo	5	1448	April 10, 2024
Yolov8 model latency on jetson orin nx Jetson Orin NX yolo	17	124	May 21, 2025
Improve inference performances yolov5 Jetson Nano yolo , nano2gb	4	1570	June 2, 2022
Low Performance - Jetson Orin Nano Super TensorRT jetson , jetson-orin	3	200	February 24, 2025
High RAM consumption in deepstream 6.0.1 DeepStream SDK gstreamer , deepstream , jetson-nano	9	1468	April 7, 2023
YOLOv8 Python Script has really high inference time due unused GPU Memory Jetson Orin NX cuda , pytorch , cudnn	4	642	March 20, 2024
YOLOv8 model training on Jetson Orin Nano Jetson Orin Nano yolo	7	650	August 1, 2024
Jetson Orin Nano 8GB, The transformed model has an inference result of [], How to investigate and solve DeepStream SDK jetson , deepstream	2	26	February 27, 2025

How to optimize the usage on jetson's RAM

Related topics