I’m working on implement a custom layer with two inputs and one output. warpped with pybind11.
when I build the engine and do the inference, it not always give me the right output. Some time it would give me a wrong output(1 error every 7~10 build).
I have add some log at the beginning of enqueue, vin(inputs[0]) is the same as what i give the network, but voff(inputs[1]) is different from what i set when I got a wrong output.
PS: voff[0] is always right even when the output is wrong.
const int k_copy_size=5;
float vin[k_copy_size];
cudaMemcpy(&vin, (float *)inputs[0], sizeof(float) * k_copy_size, cudaMemcpyDeviceToHost);
std::cout << "vin:\t";
for(int i=0;i< k_copy_size; ++i){
std::cout << vin[i] << "\t|";
}
std::cout <<std::endl;
float voff[k_copy_size];
cudaMemcpy(&voff, (float *)inputs[1], sizeof(float) * k_copy_size, cudaMemcpyDeviceToHost);
std::cout << "voff:\t";
for(int i=0;i< k_copy_size; ++i){
std::cout << voff[i] << "\t|";
}
std::cout <<std::endl;
here is how i build the engine and do the inference in python
with trt.Builder(TRT_LOGGER) as builder, builder.create_builder_config() as config, builder.create_network(EXPLICIT_BATCH) as network:
input_layer = network.add_input(name="input", dtype=trt.float32, shape=(
batch_size, in_channels, -1, -1))
offset_layer = network.add_input(name="offset", dtype=trt.float32, shape=(
batch_size, offset_channels, -1, -1))
custom_layer = network.add_plugin_v2(
inputs=[input_layer, offset_layer], plugin=custom_layer_plugin)
custom_layer.get_output(0).name = "output"
network.mark_output(custom_layer.get_output(0))
print("\n### build engine and inference")
config.max_workspace_size = max_workspace_size
profile = builder.create_optimization_profile()
profile.set_shape("input",
(batch_size, in_channels, 8, 8),
(batch_size, in_channels, 16, 16),
(batch_size, in_channels, 32, 32))
profile.set_shape("offset",
(batch_size, offset_channels, 4, 4),
(batch_size, offset_channels, 8, 8),
(batch_size, offset_channels, 16, 16))
config.add_optimization_profile(profile)
with builder.build_engine(network, config) as engine, engine.create_execution_context() as context:
stream = cuda.Stream()
binding_input_index = engine.get_binding_index("input")
binding_offset_index = engine.get_binding_index("offset")
bidding_output_index = engine.get_binding_index("output")
network_input = [None] * 3
context.set_binding_shape(binding_input_index, tensor_input.shape)
context.set_binding_shape(
binding_offset_index, tensor_offset.shape)
tensor_output *= 0
network_input[binding_input_index] = tensor_input.data_ptr()
network_input[binding_offset_index] = tensor_offset.data_ptr()
network_input[bidding_output_index] = tensor_output.data_ptr()
start = time.time()
context.execute_async_v2(
network_input, stream_handle=stream.handle)
end = time.time()
stream.synchronize()
predict_output = tensor_output.detach().cpu().numpy()
the input and offset comes from pytorch. I also tried the pycuda, same error.
Why do this happen and how do I fix it?
envirment:
1 Linux distro and version – Ubuntu 16.04
2 GPU type — 2080ti
3 Nvidia driver version - 418.56
4 CUDA version - 10.0
5 CUDNN version - 7.4.2
6 python version – 3.7
7 pytorch version – 1.2.0
8 TensorRT version – 7.0.0.11
Thank you
update 20/1/15:
I set the log level to VERBOSE and found some log like below
[TensorRT] VERBOSE: Adding reformat layer: (Unnamed Layer* 0) [PluginV2DynamicExt]
reformatted input 1 (offset)
from Float(1,(# 3 (SHAPE offset)),(* (# 2 (SHAPE offset)) (# 3 (SHAPE offset))),(* 18 (* (# 2 (SHAPE offset)) (# 3 (SHAPE offset)))))
to Float(1,(# 3 (SHAPE offset)),(* (# 2 (SHAPE offset)) (# 3 (SHAPE offset))):32,(* (# 2 (SHAPE offset)) (# 3 (SHAPE offset))))
this log only appeared when I get the wrong output.