The training time for one epoch is getting slower and slower

Hello,every one!

My CNN training speed decrease dramatically under GPU mode. I have to restart the training after several epochs by hand. The first epochs always be the fastest one, than the speed decrease linearly. Someone said it may because memory leak from mex files. But I only use Matlab 2016a +NVIDIA GPU Computing Toolkit 7.5 +matconvent 1.0-beta20. Are there some bugs in them?

My GPU is

g = CUDADevice (具有属性):

                  Name: 'GeForce GTX TITAN X'
                 Index: 1
     ComputeCapability: '5.2'
        SupportsDouble: 1
         DriverVersion: 7.5000
        ToolkitVersion: 7.5000
    MaxThreadsPerBlock: 1024
      MaxShmemPerBlock: 49152
    MaxThreadBlockSize: [1024 1024 64]
           MaxGridSize: [2.1475e+09 65535 65535]
             SIMDWidth: 32
           TotalMemory: 1.2885e+10
       AvailableMemory: 1.0680e+10
   MultiprocessorCount: 24
          ClockRateKHz: 1076000
           ComputeMode: 'Default'
  GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
      CanMapHostMemory: 1
       DeviceSupported: 1
        DeviceSelected: 1

Under CPU mode, the speed is very stable.

when restart matlab, the first epoch will be the fastest speed again, then it get slower and slower again.

Is there any one who can help me with that?