Refitting An Engine: how to set new weights to refittable engine?

I want to change weights the convolution layer in a engine.
The engine is built from a onnx file with only one convolution layer whose filter size is [64, 3, 3, 3] and serialized.

The code is:

Weights newWeights_tmp;
int wCount = 64*3*3*3;
newWeights_tmp.count = wCount;
float* newWeightsLocal = new float[wCount];
for (int i = 0; i < wCount; i++)
{
	newWeightsLocal[i] = 0.0001 * i;
}
newWeights_tmp.values = newWeightsLocal;
newWeights_tmp.type = DataType::kFLOAT;

IRefitter* refitter = createInferRefitter(*mEngine, gLogger);
std::cout << "// ----------------------------------- find out all fittablelayers ---------------------------------- //" << std::endl;
int num_fittablelayer = refitter->getAll(0, nullptr, nullptr);
	
std::cout << "num_fittablelayer = " << num_fittablelayer << std::endl;

std::vector<const char*> fittableLayerNames(num_fittablelayer);
std::vector<WeightsRole> fittableweightsRoles(num_fittablelayer);
refitter->getAll(num_fittablelayer, fittableLayerNames.data(), fittableweightsRoles.data());
for (int i = 0; i < num_fittablelayer; i++)
{
	nvinfer1::WeightsRole bType = fittableweightsRoles[i];
	using weights_roles_type = std::underlying_type<nvinfer1::WeightsRole>::type;
	std::cout << "fittableLayerNames = " << fittableLayerNames[i] << std::endl;
	std::cout << "fittableWeightsRoles = " << static_cast<weights_roles_type>(bType) << std::endl;
}


std::cout << "// ----------------------------------- setWeights ---------------------------------- //" << std::endl;
bool weightSuccess = refitter->setWeights("testnet0_conv0_fwd", WeightsRole::kKERNEL, newWeights_tmp);
std::cout << "weightSuccess = " << weightSuccess << std::endl;

const int n = refitter->getMissing(0, nullptr, nullptr);
std::cout << "refitter->getMissing = " << n << std::endl;

bool success = refitter->refitCudaEngine();
assert(success);
refitter->destroy();

Run the code above after the engine is loaded, I got the information:

// ----------------------------------- find out all fittablelayers ---------------------------------- //
num_fittablelayer = 1
fittableLayerNames = testnet0_conv0_fwd
fittableWeightsRoles = 0

// ----------------------------------- setWeights ---------------------------------- //
weightSuccess = 1
refitter->getMissing = 0
[05/29/2020-16:16:40] [F] [TRT] Assertion failed: tDims.nbDims > 3
C:\source\rtSafe\tensorLayout.cpp:19
Aborting...

But when I use a engine built from a onnx file with only one convolution layer whose filter size is [1, 1, 3, 3] and set wCount = 9 in the code above, it successed.

What is the cause of the error? The getWeights() returns 1 and the getMissing() returns 0, why the refitCudaEngine() failed? How to define the new weights correctly?

Other informations:
gpu: 2080ti
trt version: TensorRT-7.0.0.11.Windows10.x86_64.cuda-10.0.cudnn7.6
cuda version: cuda10.0
cudnn version: 7.6.5

Install new tensor rt , cuda and cudnn can solve the problem above:
trt version: TensorRT-7.1.3.4.Windows10.x86_64.cuda-11.0.cudnn8.0
cuda version: cuda11.0 RC
cudnn version: 8.0.0 RC

The refitter->refitCudaEngine() returns true.

If I changed weights of one layer, it can do inference as usual.
But if I changed weights of more than one layer, the project.exe stop work when I do inference. The stop occured at context->executeV2(myBindings.data()) and the disk active time reached 100% at that time.

How to change the weights of the refittable engine and do inference with the updated engine correctly?

Anyone knows ? Looking for your reply! Thank you!