Tensor.cuda() low fps

I’m currently working on real-time pose recognition using the trt_pose library. However, I have encountered a problem with low frames per second (FPS) in my application. After investigating the issue, I discovered that a significant amount of time is being consumed in data preparation and result extraction.

Here is the code snippet I’m using:

import time
import torch

torch.cuda.current_stream().synchronize()
WIDTH = 256
HEIGHT = 256
data_cpu = torch.zeros((1, 3, HEIGHT, WIDTH), dtype=torch.float16)
data_cpu.cuda()  # Initialize CUDA

t0 = time.time()
print('Begin test at:', t0)
for i in range(50):
    data = data_cpu.cuda()
    # y = model_trt(data.cuda())
    # y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()

print(t1)
print('FPS:', 50.0 / (t1 - t0))

would appreciate any suggestions or insights on how to optimize the code to improve the FPS. Specifically, I believe the bottleneck lies in the data preparation and result extraction steps. Please let me know if you have any ideas on how to optimize these parts of the code or if there are any other improvements I can make to increase the overall performance.

Thank you in advance for your help!

I am achieving approximately 7 FPS on my Jetson Xavier NX. However, when I remove the data copying step and only perform inference, the FPS increases to around 65.

Hi @jaclone ,

Thanks for reaching out!

7FPS sounds quite slow for copying a 256x256 image.

Do you mind sharing the following information:

  1. Which JetPack version are you using?
  2. Which version of PyTorch are you using?
  3. What power configuration are you running (nvpmodel -q)?
  4. Did you call jetson_clocks before profiling?
  5. Do you experience the same results using time.monotonic() or time.perf_counter()? These timers are guaranteed monotonic increasing, while I believe time.time() is not.

Hopefully this information can help us point you in the right direction!

Thanks!
John

hi, I use jetpack version 5.1.1, pytorch version 2.0, NV Power Mode: MODE_15W_6CORE.
but the fps is test in a docker, after I change to image nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3, the fps goes to 600.

thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.