About the python API parameter and 2D depthwise convolution

Ubuntu 16.04 LTS
GPU type:1050Ti
nvidia driver version:390.87
CUDA version:9.0
CUDNN version:7.13
Python version:3.5
TensorRT version:

I want to add a 2D depthwise convolution layers in my network.I tried it like this:

import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
import scipy.stats as st
import tensorrt as trt

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

def _normalize(t):
    return t / t.sum()

def postprocess(network):
    input_tensor = network.add_input(name='data', dtype=trt.float32, shape=(1, 368, 432))
    ksize = 25
    nsig = 3.0
    interval = (2 * nsig + 1.) / ksize
    x = np.linspace(-nsig - interval / 2., nsig + interval / 2., ksize + 1)
    y = np.diff(st.norm.cdf(x))
    gk = _normalize(np.sqrt(np.outer(y, y)))
    filters = np.outer(gk, np.ones([19])).T.reshape((19, 1,1 ksize, ksize))

def build_egnine():
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network:
        builder.max_workspace_size= 1 << 20
        return builder.build_cuda_engine(network)


But it didn’t work:
Invoked with: <tensorrt.tensorrt.INetworkDefinition object at 0x7f1c17173bc8>; kwargs: input=<tensorrt.tensorrt.ITensor object at 0x7f1c17173c38>, kernel_shape=(25, 25), kernel=array([[[[…]]]]), num_output_maps=1,num_groups=19,bias=<tensorrt.tensorrt.Weights object as

Here is my question:
1.Why the numpy.array can’t be used as kernel this way.I tried this idea in the official sample networ_api_pytorch_mnist and got the same error.
2.I want to use the num_groups to do the depthwise convolution.Does this idea work?Do I set the parameter and shape correctly?I read https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Graph/Layers.html about IConvolutionLayer and am not sure my shape setting.