Where is frame/video Index come from in deepstream detection sample

haifengli · December 24, 2017, 11:16am

In fact I have several question about the deepstream internal working flow, they are up TensorRT but below the DeepStream User code in technology stack.

1.Where is the frameIndex and videoIndex come from in detection sample of DeepStream?
Here is the code in parserModule_resnet18.h file:

BBOXS_PER_FRAME bboxs;
bboxs.frameIndex = trace_0[iB].frameIndex;
bboxs.videoIndex = trace_0[iB].videoIndex;
bboxs.nBBox = 0;

Since the .prototxt file in TensorRT level only know the [BCHW] four dimension tensor, why can DeepStream retrieve the videoIndex from Inference Module’s output and route into Parser Module’s input?

2.Why Parse Module know pCov and pBBOX is in CPU(host memory)?
I learn the code from both deepstream’s detection and tensorrt’s faster-rcnn sample, the code in deepstream:

const float *pCov = reinterpret_cast<const float*>(vpInputTensors[0]->getConstCpuData());
std::vector<TRACE_INFO > trace_0 = vpInputTensors[0]->getTraceInfos();
const float *pBBOX = reinterpret_cast<const float*>(vpInputTensors[1]->getConstCpuData());

the code in tensorrt

CHECK(cudaMalloc(&buffers[outputIndex0], batchSize * nmsMaxOut * OUTPUT_BBOX_SIZE * sizeof(float))); // bbox_pred
CHECK(cudaMalloc(&buffers[outputIndex1], batchSize * nmsMaxOut * OUTPUT_CLS_SIZE * sizeof(float)));  // cls_prob
CHECK(cudaMalloc(&buffers[outputIndex2], batchSize * nmsMaxOut * 4 * sizeof(float)));                // rois
...
CHECK(cudaMemcpyAsync(outputBboxPred, buffers[outputIndex0], batchSize * nmsMaxOut * OUTPUT_BBOX_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
CHECK(cudaMemcpyAsync(outputClsProb, buffers[outputIndex1], batchSize * nmsMaxOut * OUTPUT_CLS_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
CHECK(cudaMemcpyAsync(outputRois, buffers[outputIndex2], batchSize * nmsMaxOut * 4 * sizeof(float), cudaMemcpyDeviceToHost, stream));

user code need copy host memory to and from cuda, so why no need these steps in DeepStream?

3.How to setup deepstream if the inference module use different data layout?
We have a .prototxt network, its input is four dimensions, which is [B*10*448*448], but the “channel” is collapsed from BGR to GRAY, channel don’t exist anymore, this dimension means 10 frames now. If B is 4, then each input of this tensor need 4*10=40 frames now in TensorRT level. Can DeepStream support this scenario? A short sentence: how about only 1 output for 10 frames input to inference engine?

AastaLLL · December 25, 2017, 7:19am

Hi,

1. Index information is set when decoding frames.

2. There is still a memcpy procedure but is handled by deepstream API.

3. If your use-case can be treated as 40 inputs (Batch=40), it should be able to inference with deepstream and TensorRT with the channel size=1.

Thanks.

haifengli · December 25, 2017, 8:07am

Thanks for you reply, I want to discuss your idea about input data layout with deepstream.
3.The .prototxt support batch of input data, so the input layer should be:

name: "SomeNet"
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { <b>dim: 40 dim: 1</b> dim: 448 dim: 448 } }
}

and the output layer can be

prob · Softmax
blob shapes
  prob: [ 4, 1000 ]

for classifier network, so every 40 frames input will make 4 outputs with 1000 class. But how can we inject our custom operation to generate image from [40, 1, 448, 448] to [4, 1, 448, 224](every ten 448x448 images make one 448x224 image output) since DeepStream hide the setPluginFactory function export by CaffeParser. Do you think it is possible to left the deepstream inference network input remain as [4, 1, 448, 224], but call addCustomerTask to add a custom module before inference task?

haifengli · December 25, 2017, 8:37am

I discuss the input data layout of deepstream topic with my team leader, and the description in previous reply is not correct, so I add this reply to make things clear.
Since the deepstream handled the FRAME POOL and custom module is been add to flexible/analysis pipeline, Then what we want is: deepstream need input 10 frames or multiply of 10 to flexible pipeline for each epoch, the pseudocode may be:

OpenCVModule : public IModule{
  input: [40, 1, 448, 448];
  output: [4, 20, 448, 224];
}

main() {
  IDeviceWorker *pDeviceWorker = createDeviceWorker();
  pDeviceWorker->addDecodeTask(cudaVideoCodec_H264);
  IModule *pConvertor = pDeviceWorker->addColorSpaceConvertorTask(BGR_PLANAR);
  
  //add some opencv operation
  PRE_MODULE_LIST preModules_cv;
  preModules_cv.push_back(std::make_pair(pConvertor, 0)); // BGR_PLANAR
  OpenCVModule *pCV = new OpenCVModule(preModules_cv, ...);
  pDeviceWorker->addCustomTask(pCV);

  // Add inference task
  IModule *pInferModule = pDeviceWorker->addInferenceTask( std::make_pair(<b>pCV</b>, 0),
    "deploy.prototxt",
    "final.caffemodel",
    nullptr, //meanFile,
    "data",
    {"prob"},
    g_nChannels);

  // Detection
  PRE_MODULE_LIST preModules_parser;
  preModules_parser.push_back(std::make_pair(pInfer, 0)); // prob: [4, 1000]
  ParserModule *pParser = new ParserModule(preModules_parser, ...);
  assert(nullptr != pParser);
  pDeviceWorker->addCustomerTask(pParser);
  ...
}

4. Is it possible for deepstream to work with this analysis pipeline?
5. Should OpenCVModule::execute need clone the TRACE INFO from input stream tensor to output stream tensor by call setTraceInfo to maintain the frame and video index information?

AastaLLL · December 26, 2017, 8:31am

HI,

Guess that there is some misunderstanding between us.

1.If your workflow is [B*10*448*448] → [B*1000], the suggestion of #2 is not appropriate for your use case.
It assumes the batch image run independently and no cross-batch computation exist. So the output should be [10B*1000].

2. The dynamic input is not supported by TensorRT. We are checking the possibility but no concrete schedule.

3. YES

Thanks.

haifengli · December 26, 2017, 9:39am

The channel number in [BCHW] layout is ten and it is constructed by ten gray images out of TensorRT or DeepStream scope, The TensorRT can ignore the “C” meaning, but DeepStream need feed the “C” with RGB color space and drive analysis pipeline with dynamic frame number.
Thanks.

AastaLLL · December 27, 2017, 6:03am

Hi,

Sounds cool!

Quick try with MNIST network:
Setting the input image to 10x28x28 and TensorRT can inference it correctly.
Looks this idea is workable.

Welcome to let us know if you have further update on this.
Thanks.

haifengli · December 30, 2017, 2:10am

Sorry for the later reply. I study the MNIST example of TensorRT, and only the batch is changeable at runtime:

input: "data"
input_shape {
  dim: 1
  dim: <b>1</b>
  dim: 28
  dim: 28
}

CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, <b>buffers</b>, stream, nullptr);
CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE*sizeof(float), cudaMemcpyDeviceToHost, stream));

hn2008.lee · November 29, 2018, 7:33am

hi haifengli

I’m dealing with same problem now, could you send me user defined module “OpenCVModule” ?

Topic		Replies	Views
deepstream in python inquires DeepStream SDK	12	1684	October 12, 2021
Some question about Deep stream 5 DeepStream SDK	42	2419	October 12, 2021
TensorRT engine giving wrong/different output in DeepStream DeepStream SDK	26	4484	February 22, 2022
Object detection pre-trained model inference issue in deepstream DeepStream SDK tensorrt , jetson-inference , gstreamer , python	51	1104	August 9, 2024
DeepStream SDK and decoding RTSP on GPU DeepStream SDK	23	2272	October 12, 2021
Deepstream6.2 python frame extraction for custom deeplearning model (TensorRT) DeepStream SDK	4	463	May 22, 2023
How to get frame matrix from tiler_src_pad_buffer_probe (python deepstream_test3.app) after inference to use it with opencv DeepStream SDK gstreamer	6	1026	October 12, 2021
How to add extra function in the deepstream like edge detection? DeepStream SDK opencv , jetson-inference , deepstream	32	494	October 9, 2024
Input frame format for slowfastNet DeepStream SDK	12	1344	October 12, 2021
Trying to access stream frames from a project with a regression model DeepStream SDK	23	1305	November 9, 2021

Where is frame/video Index come from in deepstream detection sample

Related topics