Cuda Error in nvinfer1::cudnn::findFastestTactic

I am running TensorRT inference using a model written in tensorflow and converted into uff. The model is saved with input size of (1, 601, 400). However, when I try to create tensorRT engine (at calling buildCudaEngine function) with input height set to 480, I get the following error:

INFO: --------------- Timing deconv4/conv2d_transpose(4)
INFO: Tactic 1 time 2.20518
ERROR: c:\p4sw\sw\gpgpu\MachineLearning\DIT\release.0\builder\cudnnBuilderUtils.cpp (255) - Cuda Error in nvinfer1::cudnn::findFastestTactic: 77
ERROR: c:\p4sw\sw\gpgpu\MachineLearning\DIT\release.0\engine\runtime.cpp (30) - Cuda Error in nvinfer1::`anonymous-namespace'::DefaultAllocator::free: 77

If I change input height to 478 or 481, not changing anything else at all, engine builds and I can run inference without any issues. Height of 479 does not work. I tried creating engine using a different uff file with input set to (1, 480, 400), still the same error. Why could I possibly be getting this error and how can I fix it?

To help us debug, can you please share a small repro that demonstrates the engine build errors you are seeing?

Hello NVES,

It seems to be network specific, should I share my network model with you as well?

yes please

Thank you. Here is my engine creation code:

int imageH = 480;
int imageW = 400;
int maxBatchSize = 1;
string uffFile = "unet_601_400.uff";
string inputName = "Placeholder";
string outputName = "outconv/BiasAdd";
ICudaEngine* cudaEngine{ nullptr };
auto parser = createUffParser();
parser->registerInput(inputName.c_str(), Dims3(1, imageH, imageW), UffInputOrder::kNCHW);
parser->registerOutput(outputName.c_str());
IBuilder* builder = createInferBuilder(gLogger);
INetworkDefinition* network = builder->createNetwork();
std::cout << "Parsing the network. This might take up to a minute\n";
if (!parser->parse(uffFile.c_str(), *network, nvinfer1::DataType::kFLOAT))
	std::cerr << "ERROR: Cannot parse " << uffFile << std::endl;
builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(MAX_WORKSPACE);
cudaEngine = builder->buildCudaEngine(*network);
network->destroy();
builder->destroy();
parser->destroy();

I sent you the UFF file by a DM.

Hello,

per engineering;

1, Cannot reproduce this issue on Linux with current release 5.0.2 and next release.
tested parameters ranges like this:

for (int imageH = 475; imageH < 485; ++imageH)
        for (int imageW = 395; imageW < 405; ++imageW)
            for (int maxBatchSize = 1; maxBatchSize < 5; ++maxBatchSize)

2, From the error message reported in the bug description, suspect it’s caused by a kernel failure and then the next call of cudaFree catched this error. 77 means
/**
* The device encountered a load or store instruction on an invalid memory address.
* This leaves the process in an inconsistent state and any further CUDA work
* will return the same error. To continue using CUDA, the process must be terminated
* and relaunched.
*/
cudaErrorIllegalAddress = 77,

3, The tactic number in the info is 1, this is a sign of calling a CUDNN related function since 1) cask kernels use hashed values as tactic which means numbers are really big, 2) inside TRT, CUDNN algo type(ranges in 0~7) are used for tactic if it’s a CUDNN related layer.

So suggestion is

  1. link with the next version of TRT (coming soon) and check whether it works
  2. use ldd to check that the correct version of CUDNN is used or re-install the latest version of CUDNN, and re-run it.

As a reference, attached engineering testing code snippet.

nvbug2485932_loop_width_height_batch.cu (1.92 KB)