Description
How should I modify the following code when dealing with dynamic input (the input is a sequence and the sequence length is dynamic). The following code is from the demo code.
def allocate_buffers(self, engine, inputs):
"""
Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
"""
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for i in range(engine.num_io_tensors):
tensor_name = engine.get_tensor_name(i)
size = trt.volume(engine.get_tensor_shape(tensor_name))
dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't swapped to disk)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer address to device bindings. When cast to int, it's a linear index into the context's memory (like memory address). See https://documen.tician.de/pycuda/driver.html#pycuda.driver.DeviceAllocation
bindings.append(int(device_mem))
# Append to the appropriate input/output list.
if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
inputs.append(self.HostDeviceMem(host_mem, device_mem))
else:
outputs.append(self.HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings, stream
Actually, I don’t quite understand what the following code does. Does the operation here have any real effect?
profile.set_shape("z", (1, 192, 128), (1, 192, 512), (1, 192, 1024))
profile.set_shape("s_stft_real", (1, 17, 16385), (1, 17, 65537), (1, 17, 131072))
profile.set_shape("s_stft_imag", (1, 17, 16385), (1, 17, 65537), (1, 17, 131072))
And why the size in the flowing code was ‘-192’ for ‘z’? Why isn’t it the size set by set_shape?
size = trt.volume(engine.get_tensor_shape(tensor_name))
Can someone clear this up for me? Thanks a lot !!!
Environment
TensorRT Version: 10
GPU Type: A30
Nvidia Driver Version: 525.105.17
CUDA Version: 11.8
CUDNN Version: 8.9.7
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 2.4.0+cu118