• Hardware Platform (Jetson / GPU) GPU • DeepStream Version 6.2 • JetPack Version (valid for Jetson only) • TensorRT Version • NVIDIA GPU Driver Version (valid for GPU only) 525 • Issue Type( questions, new requirements, bugs) questions
I have a question about running DeepStream with Triton Inference Server for an ONNX model with batch_size=1. I read the deepstream-test3 and then I found the line if(pgie_batch_size != number_sources). If I understand correctly, if multi sources are given, the model will run with batch_size > 1. However, if my ONNX model only supports batch_size=1, how will DeepStream handle it? For example, will it run inference sequentially each input source (batch_size=1) before run downstream components?
yes, deepstream nvinferserver plugin will send batch data to trtion server, and the batch number of data will not be more than nvinferser’s batch-size, nvinferserver is opensource in DeepStream6.2, you can check in GstNvInferServerImpl::processFullFrame.
nvinferserver will leverage triton to do model inference, triton will give error if received data’s batch-size is more than model 's max_batch_size,
but according to if(pgie_batch_size != number_sources), the code will override batch_size to 3 if three streams are used, is it correct? If that is the case, we will get error because three streams (batch_size=3) doesn’t match the batch size required by the model? Or is there anyway to run three streams with an ONNX model that only support batch_size=1?
Sorry if my question is not clear. My concern is streammux’s batch-size=number_sources and the code overrides pgie_batch_size=number_sources. If three streams are used, then number_sources=3. In this case, you said that you can set pgie’s batch-size to 1 if model only supports 1 batch, so streamux_batch_size=3 and pgie_batch_size=1 won’t match?
In other words, streammux will create a batch_size=3 while pgie only expects batch_size=1, then the code will crash, won’t it?
yes, if dose not match, nvinferserver will only send max-batch-size data every time, please refer to GstNvInferServerImpl::processFullFrame, espicialy this line:
for (uint32_t batchIdx = 0; i < numFilled && batchIdx < maxBatchSize();
@fanzh Another question is if my onnx model can support batch_size=3 and I run output parsing in python, how can I know which output comes from which streams inside the pgie_src_pad_buffer_probe so that I can run bounding boxes drawing and pose visualization for each stream and then send the visualization back to the right stream?