Hello,
I am trying to run the sample_nmt provided in TensorRT 4. I followed the README and downloaded the vocab data and weights. When I attempt to run the sample I get the following
./sample_nmt --data_dir=/home/tensorrt/TensorRT-4.0.1.6/samples/sampleNMT/data/deen/
data_dir: /home/tensorrt/TensorRT-4.0.1.6/samples/sampleNMT/data/deen/
Component Info:
- Data Reader: Text Reader, vocabulary size = 36548
- Input Embedder: SLP Embedder, num inputs = 36548, num outputs = 512
- Output Embedder: SLP Embedder, num inputs = 36548, num outputs = 512
- Encoder: LSTM Encoder, num layers = 2, num units = 512
- Decoder: LSTM Decoder, num layers = 2, num units = 512
- Alignment: Multiplicative Alignment, source states size = 512, attention keys size = 512
- Context: Ragged softmax + Batch GEMM
- Attention: SLP Attention, num inputs = 1024, num outputs = 512
- Projection: SLP Projection, num inputs = 512, num outputs = 36548
- Likelihood: Softmax Likelihood
- Search Policy: Beam Search Policy, beam = 5
- Data Writer: BLEU Score Writer, max order = 4
End of Component Info
Segmentation fault (core dumped)
I checked the md5sum of the vocab data. The newstest2015.tok.bpe.32000.de and en are the same but the vocab.bpe.32000.en and de are b748c9ac3f3aefa5e2286397f03dfdfb (not c1d0ca6d4994c75574f28df7c9e8253f per the instructions https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#nmt_prepare)
My initial guess is that the vocab is slightly different from when it was tested before (according do the manual March 26, 2018). I tried downloading the data directly from google/seq2seq (seq2seq/nmt.md at master · google/seq2seq · GitHub) but the packaged vocab is again different (md5sum: 2f2dea8696324078749b750d0ceff8c2)
Any ideas?
Thanks,
Andy