How TensorRT IOPlugin get enough workspace size in explicit batch mode?

Description

In implicit batch mode, there is a parameter “maxBatchSize” in tensorRT engine. We can set it via IBuilder::setMaxBatchSize and my plugin calculate needed workspace size in function IPluginV2::getWorkspaceSize. TensorRT make sure that the argument batchSize pass through IPluginV2::enqueue, is less than maxBatchSize.

However, in explicit batch mode, parameter batchSize in IExecutionContext::enqueue has no effect (at least since TensorRT8.5.3). And IPluginV2::enqueue can get a argument batchSize greater than maxBatchSize.

This is the problem. workspaceSize is calculated based on maxBatchSize, but when it is actually executed, no one can guarantee that batchSize < maxBatchSize (maybe there is a way I don’t know) so there is a risk of overflow.

My question is, as a plugin author, how do you apply for a large enough workspace when getWorkSpaceSize only has one parameter, maxBatchSize(until TensorRT 10.7.0)? I’m glad to hear your suggestions.

Hi @mengzking ,
Quoting from doc-
Some TensorRT algorithms require additional workspace on the GPU. The method IBuilderConfig::setMemoryPoolLimit() controls the maximum amount of workspace that can be allocated and prevents algorithms that require more workspace from being considered by the builder. At runtime, the space is allocated automatically when creating an IExecutionContext. The amount allocated is no more than is required, even if the amount set in IBuilderConfig::setMemoryPoolLimit() is much higher. Applications should, therefore, allow the TensorRT builder as much workspace as they can afford; at runtime, TensorRT allocates no more than this and typically less. The workspace size may need to be limited to less than the full device memory size if device memory is needed for other purposes during the engine build.

Does this help?