Description
Suppose I define an IConvolutionLayer conv1 and have N input tensors with same sizes. These N tensors should be fed into that conv1. Is there a way to reuse conv1 without building N IConvolutionLayer. I tried ILoop but the result is not expected.
nv::ILoop* loop = network->addLoop();
loop->addTripLimit(*N, nv::TripLimit::kCOUNT);
nv::IIteratorLayer* iter = loop->addIterator(*input, 1); // N inputs are concatenated in axis 1
nv::IConvolutionLayer* conv = network->addConvolutionNd(*iter->getOutput(0),
num_out_channel,
nv::Dims2{ size_kernel, size_kernel },
weightMap[lname + ".weight"], nv::Weights{});
LOG_ASSERT(conv, "add convlution layer failed.");
int padding = (size_kernel-1)/2;
conv->setPaddingNd(nv::Dims2{ padding, padding });
conv->setStrideNd(nv::Dims2{ stride, stride });
conv->setNbGroups(num_group);
nv::ILoopOutputLayer* output = loop->addLoopOutput(*conv->getOutput(0), nv::LoopOutput::kCONCATENATE, 1);
output->setInput(1, *N);
return output;
Environment
TensorRT Version: 8.4
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered