t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()
print(50.0 / (t1 - t0))
it reported value : 5~ 6
I think it should be more than 10,
but it’s low
I think that’s because it’s not using GPU resource.
Next, do you need the swap memory to make the inference working?
Since swap memory is implemented by disk, it may induce some overhead to the pipeline.
Hi, thank you for replying
After I maximized the performance and disabled the extra swap memory,
It’s been a little bit better but still to slow,
the estimated bench score by using below code
t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()
print(50.0 / (t1 - t0))
is now 7~8, but I think it’s still low
and as I check with JTOP,
It’s still not using any of GPU resource.