`Cuda Error in allocate: 2` when building engine

Hello,

I’m using TensorRT C++ to build inference engine. As the code below shows, I first set max workspace size of the builder to the available GPU memory, and then parse the uff model and build the engine. But the error happens when builder->buildCudaEngine(*network) executes, and “Cuda Error in allocate: 2” is reported as an error.

A work-around is to set max workspace size to some other value. But I’m just not sure how to set the max workspace size. Anyone can give a hint on it?

// Get memory info from device
    size_t freeMem, totalMem;
    getMemoryInfo(&freeMem, &totalMem);

    IBuilder *builder;
    INetworkDefinition *network;
    IUffParser *P = createUffParser();

    builder->setMaxBatchSize(MAXBATCHSIZE);
    builder->setMaxWorkspaceSize(freeMem);
    network = builder->createNetwork();

    // Doing some input output registrations
    if (!P->parse(modelDir, *network, DataType::kFLOAT))
    {
        logger.log(ILogger::Severity::kERROR, "Parsing UFF file failed");
        engine = nullptr;
        context = nullptr;
    }
    else
    {
        engine = builder->buildCudaEngine(*network);  // Error happens when freeMem is passed to setMaxWorkspaceSize
    }

Hi, what is the current value of the workspace_size? (freeMem). Can you try this?:

builder->setMaxWorkspaceSize(2<<10);

Here is the reference documentation:

"
2.3. Building An Engine In C++
Two particularly important properties are the maximum batch size and the maximum workspace size:
•The maximum batch size specifies the batch size for which TensorRT willoptimize. At runtime, a smaller batch size may be chosen.
•Layer algorithms often require temporary workspace. This parameter limits the maximumsize that any layer in the network can use. If insufficient scratch is provided, it is possible that TensorRT may not be able to find an implementation for agiven layer. "

1.Build the engine using the builder object:

builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(1 << 20);
ICudaEngine* engine = builder->buildCudaEngine(*network);

Source: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#build_engine_c

Hi sanchezvr7,

Thanks for your reply. It turned out that I made a stupid mistake during my implementation. When allocating memory for my input, I accidentally wrote a square brackets into parentheses, causing memory allocation fault that led to the problem. Now the problem is solved and I could run my program smoothly.