Hi,
I’ve been trying to debug pgie because it gave different result than the original Pytorch model. I verified the result between Pytorch model and TensorRT engine (FP32 mode) and they are almost similar as expected. However, when I checked the entire DeepStream pipeline, the result is different.
Here’s my pipeline
nvstreammux -> nvdspreprocess -> nvinfer -> ...
I verified the output of nvstreammux
. Both padding and size are correct except the color, I think it’s not RGB but i guess the downstream pgie
module will take care of it.
For the output extraction from nvstreammux
, I followed this example.
Because my pipeline uses nvdspreprocess
for preprocessing, I rebuilt both libnvdsgst_preprocess.so
and libcustom2d_preprocess.so
with DEBUG_LIB
and DEBUG_TENSOR
flags enabled. This would save impl_out_batch_xxx.bin
file. When I checked the size, it wasn’t right.
tensor = np.fromfile('xxx/tensorout_batch_0.bin')
tensor.shape
<<< (992256,)
my network takes width = 1088, height = 608
, so here the tensor shape should be 1088 x 608 x 3=1984512
. How can we investigate the output from nvdspreprocess
when these 2 flags are enabled?
After my failed attempt, I then tried to verify the input of pgie
instead since it should be the same as the output of nvdspreprocess
. I followed this suggestion. I applied the patch file but for the image saving, I applied the change in queueInputBatchPreprocessed
instead of queueInputBatch
since the preprocessing in pgie
is disabled in favor of nvdspreprocess
. The snippet in this function looks as follows
#ifdef DUMP_INPUT_TO_FILE
#define DUMP_FRAME_CNT_START (0)
#define DUMP_FRAME_CNT_STOP (10)
if ((m_FrameCnt++ >= DUMP_FRAME_CNT_START) &&
(m_FrameCnt <= DUMP_FRAME_CNT_STOP))
{
void *hBuffer;
printf("batchDims.batchSize = %d\n", batchSize);
assert(m_AllLayerInfo.size());
for (size_t i = 0; i < m_AllLayerInfo.size(); i++)
{
NvDsInferLayerInfo &info = m_AllLayerInfo[i];
assert(info.inferDims.numElements > 0);
if (info.isInput)
{
int sizePerBatch =
getElementSize(info.dataType) * info.inferDims.numElements;
cudaDeviceSynchronize();
for (int b = 0; b < batchSize; b++)
{
float *indBuf =
(float *)bindings[i] + b * info.inferDims.numElements;
int w = info.inferDims.d[2];
int h = info.inferDims.d[1];
printf("width = %d, height = %d\n", w, h);
printf("sizePerBatch: %d, inferDims.numElements = %u\n", sizePerBatch, info.inferDims.numElements);
// if (scale < 1.0)
// {
// // R or B
// NvDsInferConvert_FtFTensor(
// (float *)m_inputDumpDeviceBuf,
// indBuf, w, h, w, 1 / scale, NULL, NULL);
// // G
// NvDsInferConvert_FtFTensor(
// ((float *)m_inputDumpDeviceBuf + w * h),
// ((float *)indBuf + w * h),
// w, h, w, 1 / scale, NULL, NULL);
// // B or R
// NvDsInferConvert_FtFTensor(
// ((float *)m_inputDumpDeviceBuf + 2 * w * h),
// ((float *)indBuf + 2 * w * h),
// w, h, w, 1 / scale, NULL, NULL);
// indBuf = (float *)m_inputDumpDeviceBuf;
// }
RETURN_CUDA_ERR(
cudaMemcpy(m_inputDumpHostBuf, (void *)indBuf, sizePerBatch,
cudaMemcpyDeviceToHost),
"postprocessing cudaMemcpyAsync for output buffers failed");
bool dumpToRaw = false;
std::string filename =
"gie-" + std::to_string(m_UniqueID) +
"_input-" + std::to_string(i) +
"_batch-" + std::to_string(b) +
"_frame-" + std::to_string(m_FrameCnt);
if (dumpToRaw)
filename += ".raw";
else
filename += ".png";
NvDsInferFormat format = NvDsInferFormat_RGB;
dump_to_file(filename.c_str(),
(unsigned char *)m_inputDumpHostBuf, sizePerBatch,
w, h, dumpToRaw, format);
}
}
}
}
#endif
files are saved and the shape is correct but everything is black. It doesn’t seem right to me.
so I investigate the shape
img = cv2.imread('/home/coco/workspace/sertis/object-tracking-app/gie-1_input-0_batch-0_frame-1.png')
img.shape
<<< (608, 1088, 3)
in this case, what could go wrong? is this the right way we should debug the preprocessed tensors?
Please find my configs here
[property]
enable=1
target-unique-ids=1
# network-input-shape: batch, channel, height, width
network-input-shape=1;3;608;1088
# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=0
# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
tensor-name=images
processing-width=1088
processing-height=608
# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE
# 3=NVBUF_MEM_CUDA_UNIFIED 4=NVBUF_MEM_SURFACE_ARRAY(Jetson)
scaling-pool-memory-type=0
# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU
# 2=NvBufSurfTransformCompute_VIC(Jetson)
scaling-pool-compute-hw=0
# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default
scaling-filter=0
# model input tensor pool size
tensor-buf-pool-size=8
# custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/lib/gst-plugins/libcustom2d_preprocess.so
custom-lib-path=/opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspreprocess/nvdspreprocess_lib/libcustom2d_preprocess.so
custom-tensor-preparation-function=CustomTensorPreparation
[user-configs]
pixel-normalization-factor=0.003921568
# Currently, nvsdpreprocess is in alpha stage, thus, std scaling is not yet supported.
# preprocessing logic is as follows
#
# out = pixel-normalization-factor * (x - mean[c])
#
# more detail, see https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvdspreprocess.html
# ByteTrack's mean: 0.485, 0.456, 0.406
# ByteTrack's std: 0.229, 0.224, 0.225
# rescale back to [0-255]:
# 0.485 * 255 = 123.675
# 0.456 * 255 = 116.28
# 0.406 * 255 = 103.53
offsets=123.675;116.28;103.53
[group-0]
# set src-ids=-1 to use batch_size in network-input-shape
src-ids=-1
custom-input-transformation-function=CustomAsyncTransformation
process-on-roi=0
# roi-params-src-0=0;0;1088;608
model: here
Note: I already disabled pgie
’s default preprocessing by setting input-tensor-meta=1
and also set model-color-format=0
for RGB input.
Environment
Architecture: x86_64
GPU: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
NVIDIA GPU Driver: Driver Version: 495.29.05
DeepStream Version: 6.0 (running on docker image nvcr.io/nvidia/deepstream:6.0-devel)
TensorRT Version: v8001
Issue Type: Question