Description
Hi! I have referred this and here is my case:
I use torch2trt to transfer my PyTorch model (ERFNet) to TensorRT engine.
Since the TensorRT api in torch2trt require version higher than 5, I modified some of the api to fit my version TensorRT 4 because this version is compatible with my DRIVE PX2 (driveworks-1.2, CUDA-9.2) in which I want to port my TensorRT engine.
I think i can transfer the model successfully and below is my model’s architecture output from torch2trt.
torch.Tensor.get_device
torch.nn.Conv2d.forward
torch.nn.functional.max_pool2d
torch.cat
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.max_pool2d
torch.cat
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.max_pool2d
torch.cat
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.Dropout2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.ConvTranspose2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.ConvTranspose2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.functional.relu
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.Tensor.__add__
torch.nn.functional.relu
torch.nn.ConvTranspose2d.forward
torch.nn.Conv2d.forward
torch.nn.BatchNorm2d.forward
torch.nn.functional.relu
torch.nn.Dropout2d.forward
torch.nn.Conv2d.forward
torch.nn.functional.softmax
torch.nn.functional.max_pool2d
torch.Tensor.view
torch.nn.Linear.forward
torch.nn.functional.relu
torch.nn.Linear.forward
torch.nn.functional.sigmoid
The model’s input size is (1 * 3 * 208 * 976) and outsize is (1 * 5 * 202 * 970 + 4).
I use tensorrt.utils.write_engine_to_file
to save my serialized engine.
After generating engine, I want to deserialize it and inference in C++.
std::cout << "*** Deserializing ***" << std::endl;
mTrtRunTime = createInferRuntime(gLogger);
assert(mTrtRunTime != nullptr);
mTrtEngine= mTrtRunTime->deserializeCudaEngine(data.get(), length, nullptr);
assert(mTrtEngine != nullptr);
std::cout << "*** Initialize the engine ***" << std::endl;
const int maxBatchSize = 1;
mTrtContext = mTrtEngine->createExecutionContext();
assert(mTrtContext != nullptr);
mTrtContext->setProfiler(&mTrtProfiler);
exactly IEngine::getNbBindings()
int nbBindings = mTrtEngine->getNbBindings();
mTrtCudaBuffer.resize(nbBindings);
mTrtBindBufferSize.resize(nbBindings);
for (int i = 0; i < nbBindings; ++i)
{
Dims dims = mTrtEngine->getBindingDimensions(i);
DataType dtype = mTrtEngine->getBindingDataType(i);
int64_t totalSize = volume(dims) * maxBatchSize * getElementSize(dtype);
mTrtBindBufferSize[i] = totalSize;
mTrtCudaBuffer[i] = safeCudaMalloc(totalSize);
if(mTrtEngine->bindingIsInput(i))
mTrtInputCount++;
}
CUDA_CHECK(cudaStreamCreate(&mTrtCudaStream));
Until here, no error occurred.
Then I try to input a random float vector into the model and do inference. As I get output, it shows 0 or nan all over the vector. I already check the data type but it still didn’t show any normal number.
vector<float> inputData(h * w * c); // 208 * 976 * 3
std::generate(inputData.begin(), inputData.end(), []() {
return float(rand() % 255);
});
vector<float> outputData;
outputData.resize(net.getOutputSize()/sizeof(float));
std::cout << "*** Inference ***" << std::endl;
static const int batchSize = 1;
assert(mTrtInputCount == 1);
int inputIndex = 0;
CUDA_CHECK(cudaMemcpyAsync(mTrtCudaBuffer[inputIndex], inputData.data(), mTrtBindBufferSize[inputIndex], cudaMemcpyHostToDevice, mTrtCudaStream));
auto t_start = std::chrono::high_resolution_clock::now();
mTrtContext->execute(batchSize, &mTrtCudaBuffer[inputIndex]);
auto t_end = std::chrono::high_resolution_clock::now();
float total = std::chrono::duration<float, std::milli>(t_end - t_start).count();
std::cout << "Time taken for inference is " << total << " ms." << std::endl;
for (size_t bindingIdx = mTrtInputCount; bindingIdx < mTrtBindBufferSize.size(); ++bindingIdx) {
auto size = mTrtBindBufferSize[bindingIdx];
CUDA_CHECK(cudaMemcpyAsync(outputData.data(), mTrtCudaBuffer[bindingIdx], size, cudaMemcpyDeviceToHost, mTrtCudaStream));
outputData.data() = (char *)outputData.data() + size;
}
mTrtIterationTime ++ ;
How can I solve this kind of problem? Is this something to do with TensorRT version? How can I know that the generated engine really work?
Environment
TensorRT Version: 4.0.1.6
GPU Type: GeForce GTX 1070
Nvidia Driver Version: 396.26
CUDA Version: 9.2
CUDNN Version: 7
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 2.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.0.0
Baremetal or Container (if container which image + tag):
Relevant Files
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered