tensorRT3.0 concat result is not right with multi batch

GPU is P4, batchsize =2 ,concat layer with axis =1, find the second batch result is not right, like this:
[A][A] + [B][B] -> concat -> [A][B][A][ERROR]

Please file a bug here: https://developer.nvidia.com/nvidia-developer-program
Please include the steps/files used to reproduce the problem along with the output of infer_device.