I’m currently working on real-time pose recognition using the trt_pose library. However, I have encountered a problem with low frames per second (FPS) in my application. After investigating the issue, I discovered that a significant amount of time is being consumed in data preparation and result extraction.
Here is the code snippet I’m using:
import time
import torch
torch.cuda.current_stream().synchronize()
WIDTH = 256
HEIGHT = 256
data_cpu = torch.zeros((1, 3, HEIGHT, WIDTH), dtype=torch.float16)
data_cpu.cuda() # Initialize CUDA
t0 = time.time()
print('Begin test at:', t0)
for i in range(50):
data = data_cpu.cuda()
# y = model_trt(data.cuda())
# y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()
print(t1)
print('FPS:', 50.0 / (t1 - t0))
would appreciate any suggestions or insights on how to optimize the code to improve the FPS. Specifically, I believe the bottleneck lies in the data preparation and result extraction steps. Please let me know if you have any ideas on how to optimize these parts of the code or if there are any other improvements I can make to increase the overall performance.
Thank you in advance for your help!