I use Python API for TensorRT and I can’t find docs about batch inputs for Python. Based on C++ docs, I tried to make tests with TensorRT.
I trained simple model in Pytorch and I builded the engine based on it. Model input is (3,224,224) and output is (2) (probabilities for 2 classed). Engine’s max_batch_size is 8. TensorRT inference works fine for single image inputs. I prepared a 1D buffer with size 3224224 for an image and got 2 probabilities for two classes. Now I’m trying to make batch tests. I execute the engine with batch_size=4. I altered the sample buffer allocation:
h_input = pycuda.driver.pagelocked_empty(4*trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(ModelType.DTYPE)) h_output = pycuda.driver.pagelocked_empty(4*trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(ModelType.DTYPE)) d_input = pycuda.driver.mem_alloc(h_input.nbytes) d_output = pycuda.driver.mem_alloc(h_output.nbytes) stream = pycuda.driver.Stream()
I prepared images: resized, transposed channels, and put into numpy array (4,3,224,224). Then I prepared 1D pagelock buffer with size 43224*224 for 4 image batch:
norm_images = norm_images.as_type(trt.nptype(ModelData.DTYPE)).ravel np.copyto(h_input,norm_images)
pycuda.driver.memcpy_htod_async(d_input,h_input,stream) context.execute_async(batch_size=4,bindings=[int(d_input), int(d_output)],stream_handle=stream.handle) pycuda.driver.memcpy_dtoh_async(h_output,d_output,stream) stream.synchronize()
After test I got a list with 4 same pairs of numbers. I guess I do it wrong. Can I find anywhere examples for batch tests in Python?