Description
I just want to learn Loop usage
So, demo code like this
but result is same as input_1, not 10 times addition of input_2
/*
* Use elementwise op to add tensors together, with specific loop count
* C++ code like this
* for (auto i = 0; i < 10; ++i) {
* input_1 += intput_2;
* }
*/
nvinfer1::Dims input_dims{4, {1, 3, 5, 5}};
auto input_1 = network.addInput("input_1", nvinfer1::DataType::kFLOAT, input_dims);
auto input_2 = network.addInput("input_2", nvinfer1::DataType::kFLOAT, input_dims);
int32_t sequence_size = 10;
auto loop_count_layer =
network.addConstant(nvinfer1::Dims{0, {0}},
nvinfer1::Weights{nvinfer1::DataType::kINT32, &sequence_size, 1});
auto loop_count_tensor = loop_count_layer->getOutput(0);
auto loop = network.addLoop();
loop->addTripLimit(*loop_count_tensor, nvinfer1::TripLimit::kCOUNT);
auto rec_layer = loop->addRecurrence(*input_1);
auto elementwise_layer = network.addElementWise(*rec_layer->getOutput(0), *input_2,
nvinfer1::ElementWiseOperation::kSUM);
auto accumulated_values = elementwise_layer->getOutput(0);
rec_layer->setInput(1, *accumulated_values);
auto output_layer =
loop->addLoopOutput(*rec_layer->getOutput(0), nvinfer1::LoopOutput::kLAST_VALUE);
network.markOutput(*output_layer->getOutput(0));
please tell me, what is correct usage?
Environment
TensorRT Version 8.4.1:
GPU Type T4:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered