Downsampling over the sequence length in LSTMs


I am trying to build an architecture where I have 2 stacked LSTMs where the output sequence of the first cell is downsampled before going into the second cell.

For example if I have a sequence length of SEQ_LEN, the cell after it will get an entry with a sequence length of SEQ_LEN/2, such that we take the max between the vectors at positions 2k and 2k+1 in the output sequence of the first cell for each possible k.

I have searched in the documentation and didn’t find anything referring to this type of downsampling. Did I miss some parameter or some feature in the docs ? Or do I have to implement this logic myself ?

In this case, you would need to implement this logic, maybe you can use the pooling to do the max pooling.



What If the downsampling is done by skipping every other timestep in the output of the first layer instead of using pooling. Do I just change the memory descriptor to get strided traversal to skip odd timesteps ?

Is issue resolved? Does above mentioned approach worked in your case?