TensorRT 3.0.1 - SSDNormalizePlugin destroy fails

I am trying to run SSD with TensorRT 3.0.1.
I took the sampleFasterRCNN sample and expanded it to create the required plugins for SSD.
The GIE model is built successfully but when I call pluginFactory.destoryPlugin() I get an error:

NvPluginSSD.cu (56) - Cuda Error in ~Normalize: 17
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception

Any idea what goes wrong?
Thanks.

It looks like the code is trying to do a cudaFree on the weights.values and the pointer is invalid.

Reference: CUDA Runtime API :: CUDA Toolkit Documentation

cudaErrorInvalidDevicePointer = 17
This indicates that at least one device pointer passed to the API call is not a valid device pointer.

Do I need to copy weights to GPU memory before giving it to createSSDNormalizePlugin?
This is how I initialize the layer in my plugin factory:

virtual nvinfer1::IPlugin* createPlugin(const char* layerName, const nvinfer1::Weights* weights, int nbWeights) override
{
    if (!strcmp(layerName,"conv4_3_norm"))
    {
        _nvPlugins[layerName] = nvinfer1::plugin::createSSDNormalizePlugin(weights,false,false, 1e-10);
        return _nvPlugins.at(layerName);
    }

    ...
}

Thanks.

It looks like it does the the copy from the host to the device inside the class so that’s not the problem. I see it does expect the data to be of type float or at least sizeof(float). To rule out any memory bounds issues could you try running your program with cuda-memcheck?

cuda-memcheck results:

NvPluginSSD.cu (56) - Cuda Error in ~Normalize: 17
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception
========= Error: process didn’t terminate successfully
========= Internal error (7)
========= No CUDA-MEMCHECK results found

You will need to catch the exception and exit(1) or something similar. cuda-memcheck won’t be able to give a report unless it can exit the program cleanly.

I wrapped the call to plugin destroy with a try catch block and added exit(1) in the catch.
now the output of cuda-memcheck is:


Block size 1048576
Block size 165888

Total Activation Memory: 4324064
NvPluginSSD.cu (56) - Cuda Error in ~Normalize: 17
========= Internal error (7)
========= No CUDA-MEMCHECK results found

I’m not sure why that internal pointer is getting corrupted. If you have the ability to run your code with AddressSanitizer or valgrind that might be a good next step. Do you call destroy() on the object? It frees also on destruction so that might cause a double free of sorts if you do.

Yes, I am calling destroy on the object, just like in sampleFasterRCNN where PluginFactory::destroyPlugin calls destroy() on the plugin layer (via the deleter function given to mPluginRPROI unique_ptr). I cannot call delete on the object since the destructor of INvPlugin is protected.

I will try to run valgrind, If I have no success I will prepare a minimal example and post here.

Thanks.

I have made a small example to show the problem
I reproduced the same issue with PriorBox plugin

https://www.dropbox.com/s/cfbfti5czjq9g7b/samplePriorBox.tar?dl=1

I can reproduce the problem. This is actually a bug in TensorRT. It frees the memory in both destroy() and the destructor for the plugin class. I have created a bug for the developers to look into this issue. The only way to work around the problem is to not call destroy() and it let it destruct naturally which could possibly leak memory and not be freed until the process ends. Using the following main() the error is worked around.

int main(int argc, char** argv)
{
        // create a GIE model from the caffe model and serialize it to a stream
        PluginFactory pluginFactory;
        PluginFactory pluginFactory2;
        IHostMemory *gieModelStream{ nullptr };
        caffeToGIEModel("priorbox.prototxt", "priorbox.caffemodel", std::vector < std::string > { OUTPUT_BLOB_NAME }, 1, &pluginFactory, &gieModelStream);
        //pluginFactory.destroyPlugin();

        // deserialize the engine
        IRuntime* runtime = createInferRuntime(gLogger);
        ICudaEngine* engine = runtime->deserializeCudaEngine(gieModelStream->data(), gieModelStream->size(), &pluginFactory2);


        IExecutionContext *context = engine->createExecutionContext();
        // run inference
        int outputBufferSize = GetBlobSize(*context, OUTPUT_BLOB_NAME);
        float* output = new float[outputBufferSize];
        float inputData[INPUT_H*INPUT_W*3] = {0};

        doInference(*context, inputData, output, outputBufferSize, 1);

        // destroy the engine
        context->destroy();
        engine->destroy();
        runtime->destroy();
        //pluginFactory2.destroyPlugin();

        delete[] output;

        return 0;
}

Thanks,
Your solution was very helpful.

Liran.

This should be fixed in the next release (TensorRT 4.0).

@LiranBachar @Liran

yup I am facing the same problem as well

@Wahaj

Have you tried TensorRT 4.0? It should fix this problem. If not then something else must be going on.

@ework

thanks man for suggestion, yes I tried tensorRT 4 a month ago

Firstly its not available for Jetson.
Secondly on x86 architecture its installation failed, may be a bug or something, since tensorRT 3 was installed successfully.

You are correct TensorRT 4 is not released for Jetson yet. Can you describe what issue you had when you tried to install TensorRT 4? We try to test many different scenarios so if you have a situation where it failed to install that would great to know so we can fix it. Thanks.

Do you call destroy on the object? I don’t know many about that.