Pemute->Reshape->Concat fails

Hi,

I’m trying to use the sequence of permute+reshape transforms on a bunch of layer and then apply Concat first axis. However, whenever I do this, it fails. I get in stdout(or err) the following

ERROR: Parameter check failed at: engine.cpp::enqueue::295, condition: bindings[x] != nullptr

Furthermore the data numerically is junk.
Another thing I realized was that whenever I tried to read the values of the reshape I’d get garbage, but if I tried the permute values I’d get the correct results.

Is this sequence of transforms supported by TensorRT or should I write these basic operator from scratch?

Thanks

Hello,

to help us debug, can you provide a small repro containing usage of the operation sequence that demonstrates the error you are seeing?

Sure,

I’ve uploaded the files needed to reproduce this https://drive.google.com/open?id=1_xWp2QCvKf24S1NhtS3qRfKHz0oN5eHG .

Please let me know if you are able to reproduce this.
Files originate from the sampleSSD files that comes with tensorrt.
Thanks

Hello,

Please share the prototxt file as well. Is the prototxt similar to the one working with the shipped sampleSSD?

The README has some instructions on changes that need to be made for the network to work with TRT

## How to get caffe model

* Download models_VGGNet_VOC0712_SSD_300x300.tar.gz using
the link provided by the author of SSD: https://drive.google.com/file/d/0BzKzrI_SkD1_WVVTSmQxU0dVRzA/view
* Extract the contents. tar xvf models_VGGNet_VOC0712_SSD_300x300.tar.gz
* MD5 hash commands:
  md5sum models_VGGNet_VOC0712_SSD_300x300.tar.gz
  Expected MD5 hash:
  9a795fc161fff2e8f3aed07f4d488faf  models_VGGNet_VOC0712_SSD_300x300.tar.gz
* Edit deploy.prototxt and change all the “Flatten” layers to “Reshape” operations, with the following parameters
  reshape_param {
    shape {
      dim: 0
      dim: -1
      dim: 1
      dim: 1
  }
* Update the detection_out layer to add the keep_count output as expected by the TensorRT DetectionOutput Plugin.
  top: "keep_count"
* Rename the updated deploy.prototxt to ssd.prototxt and move to data directory
  mv ssd.prototxt <TensorRT_Install_Directory>/data/ssd
* Move the caffemodel file to the data directory.
  mv VGG_VOC0712_SSD_300x300_iter_120000.caffemodel <TensorRT_Install_Directory>/data/ssd

Additional instructions are in the README.md in the sample folder.

hello, I have make sampleSSD successful and run sampleSSD follow README.md and your published README.
But it still has the error:

$ ./sample_ssd --mode FP32
Begin parsing model...
FP32 mode running...
End parsing model...
Begin building engine...
[1]    6243 segmentation fault (core dumped)  ./sample_ssd --mode FP32

I view sampleSSD.cpp and it doesn’t use any ssd plugin layer, but sample_ssd program links libnvinfer_plugin.so.5, I am confused about it.

Hi, did you solve this problem? I encountered the same problem while using a deserialized engine to inference.
I modified the ssd sample to output the serialized engine into a local file, and deserialized the file in another project to do the inference job; the deserializing seems ok but this error came up while inferencing:

ERROR: Parameter check failed at: engine.cpp::enqueue::295, condition: bindings[x] != nullptr

So what does this error indicate?

And these’s a warning message

WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

while deserializing, but I generated the engine plan file in the same environment:same TRT version and same GPU, so where did this one come from?

Thanks

@464948681 , can you provide a small repro containing the modified ssd sample, deserialize and inference code that demonstrates the error you are seeing?

@rawk.vx, what version of TRT are you using? Can you share the traceback from the segfault?

@dumbdog, please provide the prototxt to help us debug

@NVES, Hi, I use tensorRT-5.0.2.6.

@NVES
I’m working on a wrapper of TensorRT inference, the working environment is like: TensorRT-5.0.2.6, the GPU is Titan xp, CUDA Version 9.0.176, cudnn7.3.1

First I prepared the engine plan file with the SSD sample. The sampleSSD.cpp is forked from TensorRT-5.0.2.6/samples/sampleSSD, several modifies are made to save the serialized engine to a local file, so it can be reused easily without parsing the network every time.

In the function caffeToTRTModel, output the stream data to file after serializing the engine, like:

// Serialize the engine, then close everything down
           (*trtModelStream) = engine->serialize();
           nvinfer1::IHostMemory* gieModelStream = engine->serialize();
           std::ofstream outfile(engine_file.c_str(), std::ios::out | std::ios::binary);
           if (!outfile.is_open()) {
               fprintf(stderr, "fail to open engine file: %s\n", engine_file.c_str());
           }
           unsigned char* p = (unsigned char*)gieModelStream->data();
           outfile.write((char*)p, gieModelStream->size());
           outfile.close();
           engine->destroy();
           builder->destroy();

While testing, call the function loadGIEEngine to deserialize the engine from file, and use the engine to create IExecutionContext for inferencing. loadGIEEngine is like:

nvinfer1::ICudaEngine* loadGIEEngine(const std::string planFilePath) {
           // reading the model
           std::cout << "Loading TRT Engine: " << planFilePath << std::endl;
           std::stringstream gieModelStream;
           gieModelStream.seekg(0, gieModelStream.beg);
           std::ifstream cache(planFilePath);
           assert(cache.good());
           gieModelStream << cache.rdbuf();
           cache.close();
           // calculating model size
           gieModelStream.seekg(0, std::ios::end);
           const int modelSize = gieModelStream.tellg();
           gieModelStream.seekg(0, std::ios::beg);
           void* modelMem = malloc(modelSize);
           gieModelStream.read((char*)modelMem, modelSize);
           nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(gLogger);
           nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(modelMem, modelSize, nullptr);
           free(modelMem);
           runtime->destroy();
           std::cout << "Loading Complete!" << std::endl;
           return engine;
       }

Testing of save/reloading engine is ok, and then I used the engine plan file in another project. The same loadGIEEngine function is reused, and for the inferencing part I followed the origin codes in sample SSD, except I changed the type of input pointer:

void TRTInference::doInference(nvinfer1::IExecutionContext& context, unsigned char* input, float* detOutput, int* keepCount, int batchSize) {
              // input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(),
              // of these, but in this case we know that there is exactly 1 input and 2 output.
              const nvinfer1::ICudaEngine& engine = context.getEngine();
              assert(engine.getNbBindings() == 3);
              void* buffers[3];
              // In order to bind the buffers, we need to know the names of the input and output tensors.
              // note that indices are guaranteed to be less than IEngine::getNbBindings()
              int inputIndex = engine.getBindingIndex(m_input_blob_name.c_str()),
                  outputIndex0 = engine.getBindingIndex(m_det_output_blob_name.c_str()),
                  outputIndex1 = engine.getBindingIndex(m_keep_count_blob_name.c_str());
              // Create GPU buffers and a stream
              CHECK(cudaMalloc(&buffers[inputIndex], batchSize * m_input_c * m_input_h * m_input_w * sizeof(float))); // Data
              CHECK(cudaMalloc(&buffers[outputIndex0], batchSize * m_keep_topk * 7 * sizeof(float)));               // Detection_out
              CHECK(cudaMalloc(&buffers[outputIndex1], batchSize * sizeof(int)));                                  // KeepCount (BBoxs left for each batch)
              cudaStream_t stream;
              CHECK(cudaStreamCreate(&stream));
              // DMA the input to the GPU,  execute the batch asynchronously, and DMA it back:
              CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * m_input_c * m_input_h * m_input_w * sizeof(float), cudaMemcpyHostToDevice, stream));
              context.enqueue(batchSize, buffers, stream, nullptr);
              CHECK(cudaMemcpyAsync(detOutput, buffers[outputIndex0], batchSize * m_keep_topk * 7 * sizeof(float), cudaMemcpyDeviceToHost, stream));
              CHECK(cudaMemcpyAsync(keepCount, buffers[outputIndex1], batchSize * sizeof(int), cudaMemcpyDeviceToHost, stream));
              cudaStreamSynchronize(stream);
              // Release the stream and the buffers
              cudaStreamDestroy(stream);
              CHECK(cudaFree(buffers[inputIndex]));
              CHECK(cudaFree(buffers[outputIndex0]));
              CHECK(cudaFree(buffers[outputIndex1]));
          }

unsigned char* input is from cv::Mat.data, which is created from cv::dnn::blobFromImages, stands for a batch of images.
While testing this project, this warning came up:

WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

the engine deserializing part seems ok; an error

ERROR: Parameter check failed at: engine.cpp::enqueue::295, condition: bindings[x] != nullptr

came up here in the function doInference:

context.enqueue(batchSize, buffers, stream, nullptr);

I’m not sure what does this error message mean. So what may be the cause and where should I look into?

Thanks for your reply.

@464948681 when I infer a engine created using uff file and c++ api, I get the same error:

ERROR: Parameter check failed at: engine.cpp::enqueue::295, condition: bindings[x] != nullptr

have you solved it?

@AndrewGong, I still didn’t figure out what’s wrong with this one

Hello,

Per engineering, we believe we have a fix to make this work for a future TRT release. I can’t share the release schedule here, but please stay tuned.

@464948681 When I remove all bugs in my code, it works and this error will not shows. very strange. So you can check your code again. good luck

Hi NVES, I encountered the same error when I deploy my TensorRT-plugin project( which is running normally on my PC with TensorRT5.1.2) to Pegasus.
The error code is “parameter chech failed at engine.cpp::enqueue::295 site:devtalk.nvidia.com”.
My Pegasus working environment is Drive software 9.0 with TensorRT5.0.x.
Please check if I also have to wait for the future TRT release on Drive OS. Thanks!

Hi NVES, I encountered the same error when I deploy my TensorRT-plugin project( which is running normally on my PC with TensorRT5.1.2) to Pegasus.
The error code is “parameter chech failed at engine.cpp::enqueue::295 site:devtalk.nvidia.com”.
My Pegasus working environment is Drive software 9.0 with TensorRT5.0.x.
Please check if I also have to wait for the future TRT release on Drive OS. Thanks!