Tensor.cuda() low fps

jaclone · June 19, 2023, 3:00am

I’m currently working on real-time pose recognition using the trt_pose library. However, I have encountered a problem with low frames per second (FPS) in my application. After investigating the issue, I discovered that a significant amount of time is being consumed in data preparation and result extraction.

Here is the code snippet I’m using:

import time
import torch

torch.cuda.current_stream().synchronize()
WIDTH = 256
HEIGHT = 256
data_cpu = torch.zeros((1, 3, HEIGHT, WIDTH), dtype=torch.float16)
data_cpu.cuda()  # Initialize CUDA

t0 = time.time()
print('Begin test at:', t0)
for i in range(50):
    data = data_cpu.cuda()
    # y = model_trt(data.cuda())
    # y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()

print(t1)
print('FPS:', 50.0 / (t1 - t0))

would appreciate any suggestions or insights on how to optimize the code to improve the FPS. Specifically, I believe the bottleneck lies in the data preparation and result extraction steps. Please let me know if you have any ideas on how to optimize these parts of the code or if there are any other improvements I can make to increase the overall performance.

Thank you in advance for your help!

jaclone · June 19, 2023, 3:05am

I am achieving approximately 7 FPS on my Jetson Xavier NX. However, when I remove the data copying step and only perform inference, the FPS increases to around 65.

jaybdub · June 20, 2023, 6:42pm

Hi @jaclone ,

Thanks for reaching out!

7FPS sounds quite slow for copying a 256x256 image.

Do you mind sharing the following information:

Which JetPack version are you using?
Which version of PyTorch are you using?
What power configuration are you running (nvpmodel -q)?
Did you call jetson_clocks before profiling?
Do you experience the same results using time.monotonic() or time.perf_counter()? These timers are guaranteed monotonic increasing, while I believe time.time() is not.

Hopefully this information can help us point you in the right direction!

Thanks!
John

jaclone · June 21, 2023, 12:37pm

hi, I use jetpack version 5.1.1, pytorch version 2.0, NV Power Mode: MODE_15W_6CORE.
but the fps is test in a docker, after I change to image nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3, the fps goes to 600.

thanks a lot!

system · July 12, 2023, 7:15am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why is torch.tensor.to('cuda') so slow? Jetson AGX Orin pytorch	5	48	December 9, 2024
Torch Tensor.cuda() very slow Jetson TX2 pytorch	6	3228	October 18, 2021
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3531	October 23, 2020
Getting low fps (object detection with yolov8s) on Jetson Xavier NX Jetson Xavier NX tensorrt , opencv , ubuntu , jetson-inference , python	9	1386	May 30, 2023
Tf-pose-estimation FPS issues Jetson AGX Xavier jetson-inference	6	684	November 15, 2022
Trt_pose Boost performance on Jetson Xavier Jetson AGX Xavier	2	481	October 18, 2021
Slow video streaming while using pytorch with cuda Jetson AGX Xavier	4	784	October 18, 2021
Pose estimation using TRT (trt_pose) - slightly lower framerates than stated in inference Jetson Nano tensorrt	12	3660	October 15, 2021
FPS issue with Jetson Xavier NX for deep learning Jetson Xavier NX camera , vscode	4	325	March 27, 2024
Performance issues when upgrading to JetPack 5 Jetson Xavier NX jetpack , performance	12	172	October 23, 2024

Tensor.cuda() low fps

Related topics