Reuse of IConvolutionLayer (multiple inputs, shared conv layer)


Suppose I define an IConvolutionLayer conv1 and have N input tensors with same sizes. These N tensors should be fed into that conv1. Is there a way to reuse conv1 without building N IConvolutionLayer. I tried ILoop but the result is not expected.

    nv::ILoop* loop = network->addLoop();
    loop->addTripLimit(*N, nv::TripLimit::kCOUNT);
    nv::IIteratorLayer* iter = loop->addIterator(*input, 1); // N inputs are concatenated in axis 1
    nv::IConvolutionLayer* conv = network->addConvolutionNd(*iter->getOutput(0),
                                                            nv::Dims2{ size_kernel, size_kernel },
                                                            weightMap[lname + ".weight"], nv::Weights{});
    LOG_ASSERT(conv, "add convlution layer failed.");
    int padding = (size_kernel-1)/2;
    conv->setPaddingNd(nv::Dims2{ padding, padding });
    conv->setStrideNd(nv::Dims2{ stride, stride });

    nv::ILoopOutputLayer* output = loop->addLoopOutput(*conv->getOutput(0), nv::LoopOutput::kCONCATENATE, 1);
    output->setInput(1, *N);
    return output;


TensorRT Version: 8.4
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered


Please let us know why you are interested in using N-convolution layers. If the engine size is the main concern, the TRT engine has a weight de-duplication mechanism, so the engine size will not go up if you add more convolution layers with the same weights.

Thank you.

Thanks for your reply. Actually I just worry about the usage of the GPU memory. Does shared weights occupy more GPU memory as the number of shared-weight convolution layers go up during inference?