nvinfer1::ICudaEngine::createExecutionContextWithoutDeviceMemory() returns nullptr!

I am trying to use TensorRT for running inference on a tensorflow model that I converted to UFF.
I am trying to use my GPU unified memory to hold the the activation data. However, my code is failing to create an execution context i.e when context = engine->createExecutionContextWithoutDeviceMemory(); is executed, a null pointer is returned and I get the following error:

[02/12/2020-09:29:57] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[02/12/2020-09:30:03] [I] [TRT] Detected 1 inputs and 1 output network tensors.
Creating execution context
140557824 602112
[02/12/2020-09:30:03] [E] [TRT] …/rtSafe/cuda/caskConvolutionRunner.cpp (233) - Cuda Error in allocateContextResources: 1 (invalid argument)
[02/12/2020-09:30:03] [E] [TRT] FAILED_ALLOCATION: std::exception

However if I run context = engine->createExecutionContext() instead the execution context is getting created successfully. Can someone please help me with this?

Here is my code.

int main(){
        std::cout << "\n Running \n";
        IBuilder* builder = createInferBuilder(gLogger);
        INetworkDefinition* network = builder->createNetwork();
        nvuffparser::IUffParser* parser = nvuffparser::createUffParser();
        parser->registerInput(INPUT_BLOB_NAME, DimsCHW(3, 224, 224), nvuffparser::UffInputOrder::kNCHW);
        //const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);  
        parser->parse("/mnt/model/inception/inception.uff", *network, nvinfer1::DataType::kFLOAT);
        IBuilderConfig* config = builder->createBuilderConfig();
        config->setMaxWorkspaceSize(1 << 20);
        ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
        std::cout<<"Creating execution context"<<endl;
        size_t engine_size = engine->getDeviceMemorySize();
        std::cout<<engine_size<<" "<<sizeof(float)*224*224*3<<endl;

        IExecutionContext *context;
        context = engine->createExecutionContextWithoutDeviceMemory();
        //context = engine->createExecutionContext();
        if (context == nullptr){
                std::cout<<"FAILED ALLOCATION"<<context<<endl;



It’s a known issue, we will address the issue in a future release.
Stay tuned for announcements.


Is there any work around that I can use to load the activation data of my model onto the GPU unified memory?

@SunilJB Has there been any update to resolve this?

Hi @curiousguy,
I think issue is fixed in latest TRT 7.2 release, could you please try on latest version and let us know in case you are still facing issues.


@SunilJB What would be the solution for Windows?