I am trying to run SSD with TensorRT 3.0.1.
I took the sampleFasterRCNN sample and expanded it to create the required plugins for SSD.
The GIE model is built successfully but when I call pluginFactory.destoryPlugin() I get an error:
NvPluginSSD.cu (56) - Cuda Error in ~Normalize: 17
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception
It looks like it does the the copy from the host to the device inside the class so that’s not the problem. I see it does expect the data to be of type float or at least sizeof(float). To rule out any memory bounds issues could you try running your program with cuda-memcheck?
NvPluginSSD.cu (56) - Cuda Error in ~Normalize: 17
terminate called after throwing an instance of ‘nvinfer1::CudaError’
what(): std::exception
========= Error: process didn’t terminate successfully
========= Internal error (7)
========= No CUDA-MEMCHECK results found
You will need to catch the exception and exit(1) or something similar. cuda-memcheck won’t be able to give a report unless it can exit the program cleanly.
I wrapped the call to plugin destroy with a try catch block and added exit(1) in the catch.
now the output of cuda-memcheck is:
…
Block size 1048576
Block size 165888
Total Activation Memory: 4324064
NvPluginSSD.cu (56) - Cuda Error in ~Normalize: 17
========= Internal error (7)
========= No CUDA-MEMCHECK results found
I’m not sure why that internal pointer is getting corrupted. If you have the ability to run your code with AddressSanitizer or valgrind that might be a good next step. Do you call destroy() on the object? It frees also on destruction so that might cause a double free of sorts if you do.
Yes, I am calling destroy on the object, just like in sampleFasterRCNN where PluginFactory::destroyPlugin calls destroy() on the plugin layer (via the deleter function given to mPluginRPROI unique_ptr). I cannot call delete on the object since the destructor of INvPlugin is protected.
I will try to run valgrind, If I have no success I will prepare a minimal example and post here.
I can reproduce the problem. This is actually a bug in TensorRT. It frees the memory in both destroy() and the destructor for the plugin class. I have created a bug for the developers to look into this issue. The only way to work around the problem is to not call destroy() and it let it destruct naturally which could possibly leak memory and not be freed until the process ends. Using the following main() the error is worked around.
int main(int argc, char** argv)
{
// create a GIE model from the caffe model and serialize it to a stream
PluginFactory pluginFactory;
PluginFactory pluginFactory2;
IHostMemory *gieModelStream{ nullptr };
caffeToGIEModel("priorbox.prototxt", "priorbox.caffemodel", std::vector < std::string > { OUTPUT_BLOB_NAME }, 1, &pluginFactory, &gieModelStream);
//pluginFactory.destroyPlugin();
// deserialize the engine
IRuntime* runtime = createInferRuntime(gLogger);
ICudaEngine* engine = runtime->deserializeCudaEngine(gieModelStream->data(), gieModelStream->size(), &pluginFactory2);
IExecutionContext *context = engine->createExecutionContext();
// run inference
int outputBufferSize = GetBlobSize(*context, OUTPUT_BLOB_NAME);
float* output = new float[outputBufferSize];
float inputData[INPUT_H*INPUT_W*3] = {0};
doInference(*context, inputData, output, outputBufferSize, 1);
// destroy the engine
context->destroy();
engine->destroy();
runtime->destroy();
//pluginFactory2.destroyPlugin();
delete[] output;
return 0;
}
thanks man for suggestion, yes I tried tensorRT 4 a month ago
Firstly its not available for Jetson.
Secondly on x86 architecture its installation failed, may be a bug or something, since tensorRT 3 was installed successfully.
You are correct TensorRT 4 is not released for Jetson yet. Can you describe what issue you had when you tried to install TensorRT 4? We try to test many different scenarios so if you have a situation where it failed to install that would great to know so we can fix it. Thanks.