Loading experiment spec at /workspace/spec_255to19.txt. Running for 240 Epochs Epoch: 0/240:, Cur-Step: 0, loss(cross_entropy): 2.97355, Running average loss:2.97355, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 175, loss(cross_entropy): 2.10496, Running average loss:2.33482, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 350, loss(cross_entropy): 1.52638, Running average loss:2.86577, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 525, loss(cross_entropy): 20.49440, Running average loss:3.19219, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 700, loss(cross_entropy): 1.87202, Running average loss:3.38809, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 875, loss(cross_entropy): 12.24792, Running average loss:3.47274, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 1050, loss(cross_entropy): 1.82034, Running average loss:3.66717, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 1225, loss(cross_entropy): 1.80049, Running average loss:3.74322, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 1400, loss(cross_entropy): 2.36367, Running average loss:3.83895, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 0/240:, Cur-Step: 1575, loss(cross_entropy): 6.98213, Running average loss:3.95323, Time taken: 0:00:00 ETA: 0:00:00 Epoch: 1/240:, Cur-Step: 1750, loss(cross_entropy): 5.19372, Running average loss:5.19372, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 1925, loss(cross_entropy): 4.59479, Running average loss:5.93682, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 2100, loss(cross_entropy): 7.83309, Running average loss:5.67063, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 2275, loss(cross_entropy): 2.49193, Running average loss:5.31658, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 2450, loss(cross_entropy): 2.59128, Running average loss:5.47445, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 2625, loss(cross_entropy): 5.96433, Running average loss:5.30710, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 2800, loss(cross_entropy): 2.71793, Running average loss:5.72172, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 2975, loss(cross_entropy): 9.64003, Running average loss:6.21375, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 3150, loss(cross_entropy): 9.92613, Running average loss:6.32580, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 1/240:, Cur-Step: 3325, loss(cross_entropy): 3.47124, Running average loss:6.56394, Time taken: 0:00:26.263160 ETA: 1:44:36.895352 Epoch: 2/240:, Cur-Step: 3500, loss(cross_entropy): 11.36320, Running average loss:11.36320, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 3675, loss(cross_entropy): 8.33066, Running average loss:9.22459, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 3850, loss(cross_entropy): 15.07006, Running average loss:10.23242, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 4025, loss(cross_entropy): 7.81667, Running average loss:10.48296, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 4200, loss(cross_entropy): 10.36883, Running average loss:9.53271, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 4375, loss(cross_entropy): 13.05281, Running average loss:9.48067, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 4550, loss(cross_entropy): 3.48899, Running average loss:9.62583, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 4725, loss(cross_entropy): 13.54017, Running average loss:9.74092, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 4900, loss(cross_entropy): 10.31103, Running average loss:10.42597, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 2/240:, Cur-Step: 5075, loss(cross_entropy): 4.01739, Running average loss:10.33932, Time taken: 0:00:27.613067 ETA: 1:49:31.910038 Epoch: 3/240:, Cur-Step: 5250, loss(cross_entropy): 16.85646, Running average loss:16.85646, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 5425, loss(cross_entropy): 6.15663, Running average loss:12.59133, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 5600, loss(cross_entropy): 5.66125, Running average loss:10.98638, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 5775, loss(cross_entropy): 14.70737, Running average loss:11.14147, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 5950, loss(cross_entropy): 15.81525, Running average loss:11.40353, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 6125, loss(cross_entropy): 6.33453, Running average loss:11.75048, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 6300, loss(cross_entropy): 15.02018, Running average loss:12.25228, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 6475, loss(cross_entropy): 10.24831, Running average loss:12.54505, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 6650, loss(cross_entropy): 2.68494, Running average loss:12.66512, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 3/240:, Cur-Step: 6825, loss(cross_entropy): 2.97193, Running average loss:12.98659, Time taken: 0:00:27.679941 ETA: 1:49:20.146059 Epoch: 4/240:, Cur-Step: 7000, loss(cross_entropy): 6.90021, Running average loss:6.90021, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 7175, loss(cross_entropy): 14.51169, Running average loss:16.65381, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 7350, loss(cross_entropy): 32.62397, Running average loss:17.33362, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 7525, loss(cross_entropy): 20.44897, Running average loss:16.88818, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 7700, loss(cross_entropy): 16.73552, Running average loss:17.45542, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 7875, loss(cross_entropy): 21.92655, Running average loss:16.79563, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 8050, loss(cross_entropy): 8.21126, Running average loss:17.00806, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 8225, loss(cross_entropy): 14.00029, Running average loss:17.36651, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 8400, loss(cross_entropy): 10.84811, Running average loss:17.17518, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 4/240:, Cur-Step: 8575, loss(cross_entropy): 13.34948, Running average loss:16.92073, Time taken: 0:00:28.479658 ETA: 1:52:01.199262 Epoch: 5/240:, Cur-Step: 8750, loss(cross_entropy): 17.55043, Running average loss:17.55043, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 8925, loss(cross_entropy): 10.41822, Running average loss:15.17152, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 9100, loss(cross_entropy): 12.17200, Running average loss:15.92529, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 9275, loss(cross_entropy): 13.81417, Running average loss:19.59349, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 9450, loss(cross_entropy): 5.48245, Running average loss:19.36751, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 9625, loss(cross_entropy): 18.08485, Running average loss:19.35624, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 9800, loss(cross_entropy): 9.94846, Running average loss:19.31589, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 9975, loss(cross_entropy): 207.17238, Running average loss:19.11459, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 10150, loss(cross_entropy): 22.53954, Running average loss:19.49921, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 5/240:, Cur-Step: 10325, loss(cross_entropy): 20.33259, Running average loss:19.40976, Time taken: 0:00:28.408903 ETA: 1:51:16.092178 Epoch: 6/240:, Cur-Step: 10500, loss(cross_entropy): 28.07764, Running average loss:28.07764, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 10675, loss(cross_entropy): 3.06355, Running average loss:19.82113, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 10850, loss(cross_entropy): 35.27596, Running average loss:22.28593, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 11025, loss(cross_entropy): 27.43542, Running average loss:24.43782, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 11200, loss(cross_entropy): 34.57122, Running average loss:25.58741, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 11375, loss(cross_entropy): 10.09493, Running average loss:28.38770, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 11550, loss(cross_entropy): 57.05129, Running average loss:30.10102, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 11725, loss(cross_entropy): 29.79902, Running average loss:31.91623, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 11900, loss(cross_entropy): 22.65955, Running average loss:31.82731, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 6/240:, Cur-Step: 12075, loss(cross_entropy): 23.35737, Running average loss:32.61655, Time taken: 0:00:28.465283 ETA: 1:51:00.876314 Epoch: 7/240:, Cur-Step: 12250, loss(cross_entropy): 50.48221, Running average loss:50.48221, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 12425, loss(cross_entropy): 52.70041, Running average loss:43.99447, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 12600, loss(cross_entropy): 60.49463, Running average loss:40.87273, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 12775, loss(cross_entropy): 39.82799, Running average loss:40.62234, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 12950, loss(cross_entropy): 48.64680, Running average loss:38.86495, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 13125, loss(cross_entropy): 62.98654, Running average loss:39.05457, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 13300, loss(cross_entropy): 28.92304, Running average loss:38.63515, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 13475, loss(cross_entropy): 75.60844, Running average loss:38.21761, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 13650, loss(cross_entropy): 11.24865, Running average loss:37.44655, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 7/240:, Cur-Step: 13825, loss(cross_entropy): 57.62770, Running average loss:37.15671, Time taken: 0:00:27.799846 ETA: 1:47:57.364158 Epoch: 8/240:, Cur-Step: 14000, loss(cross_entropy): 48.44685, Running average loss:48.44685, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 14175, loss(cross_entropy): 40.73534, Running average loss:36.49459, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 14350, loss(cross_entropy): 12.59124, Running average loss:36.70106, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 14525, loss(cross_entropy): 28.66698, Running average loss:37.36619, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 14700, loss(cross_entropy): 24.48158, Running average loss:35.29129, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 14875, loss(cross_entropy): 35.75267, Running average loss:35.32170, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 15050, loss(cross_entropy): 27.94627, Running average loss:35.41428, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 15225, loss(cross_entropy): 25.94492, Running average loss:34.17837, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 15400, loss(cross_entropy): 32.43598, Running average loss:34.15762, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 8/240:, Cur-Step: 15575, loss(cross_entropy): 67.83212, Running average loss:34.10430, Time taken: 0:00:28.332444 ETA: 1:49:33.127108 Epoch: 9/240:, Cur-Step: 15750, loss(cross_entropy): 25.52960, Running average loss:25.52960, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 15925, loss(cross_entropy): 23.67295, Running average loss:31.31346, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 16100, loss(cross_entropy): 18.69393, Running average loss:32.19014, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 16275, loss(cross_entropy): 16.28861, Running average loss:31.97879, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 16450, loss(cross_entropy): 28.41758, Running average loss:31.13426, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 16625, loss(cross_entropy): 25.26822, Running average loss:30.51106, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 16800, loss(cross_entropy): 22.65336, Running average loss:30.73686, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 16975, loss(cross_entropy): 33.37226, Running average loss:30.78299, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 17150, loss(cross_entropy): 29.93199, Running average loss:31.02376, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 9/240:, Cur-Step: 17325, loss(cross_entropy): 23.72595, Running average loss:31.06873, Time taken: 0:00:27.351914 ETA: 1:45:18.292063 Epoch: 10/240:, Cur-Step: 17500, loss(cross_entropy): 27.51016, Running average loss:27.51016, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 17675, loss(cross_entropy): 29.25977, Running average loss:31.05461, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 17850, loss(cross_entropy): 37.26941, Running average loss:31.15054, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 18025, loss(cross_entropy): 13.73830, Running average loss:32.74477, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 18200, loss(cross_entropy): 42.33891, Running average loss:32.85256, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 18375, loss(cross_entropy): 44.56697, Running average loss:31.88464, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 18550, loss(cross_entropy): 29.89429, Running average loss:32.45634, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 18725, loss(cross_entropy): 32.39836, Running average loss:31.63962, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 18900, loss(cross_entropy): 21.20871, Running average loss:31.10440, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 10/240:, Cur-Step: 19075, loss(cross_entropy): 12.38001, Running average loss:30.77267, Time taken: 0:00:27.084113 ETA: 1:43:49.346073 Epoch: 11/240:, Cur-Step: 19250, loss(cross_entropy): 21.87619, Running average loss:21.87619, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 19425, loss(cross_entropy): 37.80048, Running average loss:24.14484, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 19600, loss(cross_entropy): 21.45160, Running average loss:30.11260, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 19775, loss(cross_entropy): 12.49071, Running average loss:29.58788, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 19950, loss(cross_entropy): 15.77384, Running average loss:29.25259, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 20125, loss(cross_entropy): 24.09658, Running average loss:29.14718, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 20300, loss(cross_entropy): 36.17828, Running average loss:28.39217, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 20475, loss(cross_entropy): 39.08428, Running average loss:28.98612, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 20650, loss(cross_entropy): 16.84640, Running average loss:29.81151, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 11/240:, Cur-Step: 20825, loss(cross_entropy): 23.94439, Running average loss:29.66313, Time taken: 0:00:27.267084 ETA: 1:44:04.162209 Epoch: 12/240:, Cur-Step: 21000, loss(cross_entropy): 46.81028, Running average loss:46.81028, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 21175, loss(cross_entropy): 11.19377, Running average loss:27.85498, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 21350, loss(cross_entropy): 19.39407, Running average loss:27.76894, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 21525, loss(cross_entropy): 40.85095, Running average loss:27.34611, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 21700, loss(cross_entropy): 15.50670, Running average loss:28.03875, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 21875, loss(cross_entropy): 22.58064, Running average loss:26.73000, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 22050, loss(cross_entropy): 42.52981, Running average loss:26.73347, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 22225, loss(cross_entropy): 33.67472, Running average loss:26.99231, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 22400, loss(cross_entropy): 12.67815, Running average loss:26.99073, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 12/240:, Cur-Step: 22575, loss(cross_entropy): 40.10867, Running average loss:26.52822, Time taken: 0:00:27.835447 ETA: 1:45:46.481933 Epoch: 13/240:, Cur-Step: 22750, loss(cross_entropy): 17.03746, Running average loss:17.03746, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 22925, loss(cross_entropy): 48.61608, Running average loss:22.50400, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 23100, loss(cross_entropy): 14.36811, Running average loss:26.25010, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 23275, loss(cross_entropy): 21.09358, Running average loss:25.40340, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 23450, loss(cross_entropy): 39.33736, Running average loss:27.13042, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 23625, loss(cross_entropy): 52.28710, Running average loss:27.38368, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 23800, loss(cross_entropy): 19.46107, Running average loss:26.65085, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 23975, loss(cross_entropy): 15.73156, Running average loss:25.72753, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 24150, loss(cross_entropy): 23.12545, Running average loss:25.90561, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 13/240:, Cur-Step: 24325, loss(cross_entropy): 75.92471, Running average loss:25.42011, Time taken: 0:00:27.571558 ETA: 1:44:18.743666 Epoch: 14/240:, Cur-Step: 24500, loss(cross_entropy): 37.73255, Running average loss:37.73255, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 24675, loss(cross_entropy): 38.58490, Running average loss:27.48900, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 24850, loss(cross_entropy): 41.17404, Running average loss:23.81552, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 25025, loss(cross_entropy): 27.19807, Running average loss:24.92343, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 25200, loss(cross_entropy): 21.25667, Running average loss:25.68415, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 25375, loss(cross_entropy): 20.49899, Running average loss:24.68163, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 25550, loss(cross_entropy): 14.63152, Running average loss:24.45847, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 25725, loss(cross_entropy): 23.93507, Running average loss:25.17148, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 25900, loss(cross_entropy): 29.87931, Running average loss:25.03410, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 14/240:, Cur-Step: 26075, loss(cross_entropy): 19.25637, Running average loss:25.49798, Time taken: 0:00:28.328226 ETA: 1:46:42.178988 Epoch: 15/240:, Cur-Step: 26250, loss(cross_entropy): 12.56713, Running average loss:12.56713, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 26425, loss(cross_entropy): 22.92260, Running average loss:17.23582, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 26600, loss(cross_entropy): 21.58149, Running average loss:20.10641, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 26775, loss(cross_entropy): 43.03142, Running average loss:22.03109, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 26950, loss(cross_entropy): 23.26792, Running average loss:22.68702, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 27125, loss(cross_entropy): 25.33488, Running average loss:23.28270, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 27300, loss(cross_entropy): 33.42113, Running average loss:23.35506, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 27475, loss(cross_entropy): 13.98232, Running average loss:24.27577, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 27650, loss(cross_entropy): 16.85742, Running average loss:23.65927, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 15/240:, Cur-Step: 27825, loss(cross_entropy): 45.44421, Running average loss:23.95412, Time taken: 0:00:27.919748 ETA: 1:44:41.943262 Epoch: 16/240:, Cur-Step: 28000, loss(cross_entropy): 19.01698, Running average loss:19.01698, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 28175, loss(cross_entropy): 25.39265, Running average loss:22.89304, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 28350, loss(cross_entropy): 21.18563, Running average loss:23.96657, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 28525, loss(cross_entropy): 21.33551, Running average loss:23.27589, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 28700, loss(cross_entropy): 36.13998, Running average loss:23.10489, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 28875, loss(cross_entropy): 72.41563, Running average loss:23.54076, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 29050, loss(cross_entropy): 22.29706, Running average loss:24.18528, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 29225, loss(cross_entropy): 15.84551, Running average loss:23.15629, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 29400, loss(cross_entropy): 28.91731, Running average loss:22.82906, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 16/240:, Cur-Step: 29575, loss(cross_entropy): 15.13240, Running average loss:22.76668, Time taken: 0:00:27.724307 ETA: 1:43:30.244675 Epoch: 17/240:, Cur-Step: 29750, loss(cross_entropy): 13.34676, Running average loss:13.34676, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 29925, loss(cross_entropy): 17.06515, Running average loss:21.48439, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 30100, loss(cross_entropy): 27.28145, Running average loss:22.32643, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 30275, loss(cross_entropy): 17.00446, Running average loss:23.59509, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 30450, loss(cross_entropy): 14.95757, Running average loss:22.84580, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 30625, loss(cross_entropy): 8.24830, Running average loss:22.94007, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 30800, loss(cross_entropy): 54.93713, Running average loss:23.65098, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 30975, loss(cross_entropy): 9.00665, Running average loss:22.80501, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 31150, loss(cross_entropy): 15.59656, Running average loss:22.78551, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 17/240:, Cur-Step: 31325, loss(cross_entropy): 20.59897, Running average loss:22.76914, Time taken: 0:00:28.245551 ETA: 1:44:58.757951 Epoch: 18/240:, Cur-Step: 31500, loss(cross_entropy): 28.29924, Running average loss:28.29924, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 31675, loss(cross_entropy): 27.83092, Running average loss:26.14442, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 31850, loss(cross_entropy): 17.53442, Running average loss:24.34653, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 32025, loss(cross_entropy): 18.33836, Running average loss:24.64684, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 32200, loss(cross_entropy): 44.26008, Running average loss:23.56209, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 32375, loss(cross_entropy): 46.00809, Running average loss:23.46312, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 32550, loss(cross_entropy): 33.88716, Running average loss:23.03737, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 32725, loss(cross_entropy): 11.98004, Running average loss:22.84018, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 32900, loss(cross_entropy): 30.96837, Running average loss:22.85011, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 18/240:, Cur-Step: 33075, loss(cross_entropy): 20.22805, Running average loss:22.74668, Time taken: 0:00:28.118820 ETA: 1:44:02.378082 Epoch: 19/240:, Cur-Step: 33250, loss(cross_entropy): 18.06378, Running average loss:18.06378, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 33425, loss(cross_entropy): 31.09880, Running average loss:29.31878, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 33600, loss(cross_entropy): 24.40600, Running average loss:32.35582, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 33775, loss(cross_entropy): 26.14706, Running average loss:28.94874, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 33950, loss(cross_entropy): 48.37971, Running average loss:28.60942, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 34125, loss(cross_entropy): 13.51628, Running average loss:27.63324, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 34300, loss(cross_entropy): 16.93643, Running average loss:27.38930, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 34475, loss(cross_entropy): 25.69830, Running average loss:27.90110, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 34650, loss(cross_entropy): 38.29307, Running average loss:27.62378, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 19/240:, Cur-Step: 34825, loss(cross_entropy): 29.67404, Running average loss:26.52225, Time taken: 0:00:27.610368 ETA: 1:41:41.891436 Epoch: 20/240:, Cur-Step: 35000, loss(cross_entropy): 130.82886, Running average loss:130.82886, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 35175, loss(cross_entropy): 28.22937, Running average loss:23.30720, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 35350, loss(cross_entropy): 14.09888, Running average loss:22.62403, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 35525, loss(cross_entropy): 25.74077, Running average loss:22.41575, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 35700, loss(cross_entropy): 32.66961, Running average loss:23.45405, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 35875, loss(cross_entropy): 11.47605, Running average loss:24.38612, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 36050, loss(cross_entropy): 16.31390, Running average loss:26.83028, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 36225, loss(cross_entropy): 14.09104, Running average loss:26.09778, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 36400, loss(cross_entropy): 42.53473, Running average loss:28.03135, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 20/240:, Cur-Step: 36575, loss(cross_entropy): 24.43196, Running average loss:29.20927, Time taken: 0:00:26.981140 ETA: 1:38:55.850883 Epoch: 21/240:, Cur-Step: 36750, loss(cross_entropy): 8.87781, Running average loss:8.87781, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 36925, loss(cross_entropy): 66.24826, Running average loss:32.93134, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 37100, loss(cross_entropy): 33.62341, Running average loss:35.72619, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 37275, loss(cross_entropy): 57.27899, Running average loss:36.79869, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 37450, loss(cross_entropy): 34.12579, Running average loss:35.53221, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 37625, loss(cross_entropy): 39.56116, Running average loss:35.09910, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 37800, loss(cross_entropy): 43.27170, Running average loss:34.81295, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 37975, loss(cross_entropy): 10.67711, Running average loss:31.43654, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 38150, loss(cross_entropy): 19.33972, Running average loss:30.06124, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 21/240:, Cur-Step: 38325, loss(cross_entropy): 20.06974, Running average loss:30.20703, Time taken: 0:00:27.660117 ETA: 1:40:57.565551 Epoch: 22/240:, Cur-Step: 38500, loss(cross_entropy): 23.63970, Running average loss:23.63970, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 38675, loss(cross_entropy): 9.01602, Running average loss:19.15760, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 38850, loss(cross_entropy): 17.99681, Running average loss:21.29612, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 39025, loss(cross_entropy): 25.60028, Running average loss:25.09108, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 39200, loss(cross_entropy): 25.91805, Running average loss:25.50328, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 39375, loss(cross_entropy): 17.56720, Running average loss:25.22780, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 39550, loss(cross_entropy): 26.69650, Running average loss:26.98283, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 39725, loss(cross_entropy): 19.96941, Running average loss:27.27957, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 39900, loss(cross_entropy): 15.03324, Running average loss:27.03558, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 22/240:, Cur-Step: 40075, loss(cross_entropy): 10.55925, Running average loss:25.20545, Time taken: 0:00:27.266257 ETA: 1:39:04.044088 Epoch: 23/240:, Cur-Step: 40250, loss(cross_entropy): 7.32204, Running average loss:7.32204, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 40425, loss(cross_entropy): 7.92856, Running average loss:38.65161, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 40600, loss(cross_entropy): 15.97189, Running average loss:32.97982, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 40775, loss(cross_entropy): 21.21016, Running average loss:33.35497, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 40950, loss(cross_entropy): 19.43335, Running average loss:28.54031, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 41125, loss(cross_entropy): 14.19706, Running average loss:25.23092, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 41300, loss(cross_entropy): 38.07809, Running average loss:24.70285, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 41475, loss(cross_entropy): 10.43310, Running average loss:23.30123, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 41650, loss(cross_entropy): 14.51285, Running average loss:22.55097, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 23/240:, Cur-Step: 41825, loss(cross_entropy): 48.10115, Running average loss:23.62497, Time taken: 0:00:27.921411 ETA: 1:40:58.946092 Epoch: 24/240:, Cur-Step: 42000, loss(cross_entropy): 19.30618, Running average loss:19.30618, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 42175, loss(cross_entropy): 26.90832, Running average loss:31.74410, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 42350, loss(cross_entropy): 19.10077, Running average loss:31.49474, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 42525, loss(cross_entropy): 7.93653, Running average loss:26.98662, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 42700, loss(cross_entropy): 34.30414, Running average loss:26.33862, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 42875, loss(cross_entropy): 40.55352, Running average loss:26.56686, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 43050, loss(cross_entropy): 17.72376, Running average loss:26.32458, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 43225, loss(cross_entropy): 7.46388, Running average loss:24.61120, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 43400, loss(cross_entropy): 34.74850, Running average loss:24.37419, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 24/240:, Cur-Step: 43575, loss(cross_entropy): 26.20623, Running average loss:25.36805, Time taken: 0:00:28.012387 ETA: 1:40:50.675549 Epoch: 25/240:, Cur-Step: 43750, loss(cross_entropy): 23.23136, Running average loss:23.23136, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 43925, loss(cross_entropy): 47.13501, Running average loss:33.68877, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 44100, loss(cross_entropy): 30.18619, Running average loss:36.23922, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 44275, loss(cross_entropy): 26.11545, Running average loss:33.34297, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 44450, loss(cross_entropy): 14.70662, Running average loss:30.55847, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 44625, loss(cross_entropy): 6.58802, Running average loss:29.08268, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 44800, loss(cross_entropy): 30.50866, Running average loss:29.13398, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 44975, loss(cross_entropy): 70.65530, Running average loss:29.89120, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 45150, loss(cross_entropy): 29.10534, Running average loss:31.67377, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 25/240:, Cur-Step: 45325, loss(cross_entropy): 13.66415, Running average loss:31.31645, Time taken: 0:00:27.477464 ETA: 1:38:27.654854 Epoch: 26/240:, Cur-Step: 45500, loss(cross_entropy): 16.91474, Running average loss:16.91474, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 45675, loss(cross_entropy): 23.93484, Running average loss:63.61133, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 45850, loss(cross_entropy): 22.23991, Running average loss:48.09814, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 46025, loss(cross_entropy): 32.98484, Running average loss:40.69484, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 46200, loss(cross_entropy): 23.92840, Running average loss:39.40798, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 46375, loss(cross_entropy): 12.92407, Running average loss:36.82237, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 46550, loss(cross_entropy): 17.62067, Running average loss:34.47463, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 46725, loss(cross_entropy): 33.96826, Running average loss:34.24284, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 46900, loss(cross_entropy): 60.26830, Running average loss:33.56092, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 26/240:, Cur-Step: 47075, loss(cross_entropy): 36.39840, Running average loss:34.63967, Time taken: 0:00:27.660949 ETA: 1:38:39.443084 Epoch: 27/240:, Cur-Step: 47250, loss(cross_entropy): 10.06231, Running average loss:10.06231, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 47425, loss(cross_entropy): 24.97489, Running average loss:23.30479, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 47600, loss(cross_entropy): 11.42934, Running average loss:33.04949, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 47775, loss(cross_entropy): 5.75143, Running average loss:27.29245, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 47950, loss(cross_entropy): 35.82261, Running average loss:24.09510, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 48125, loss(cross_entropy): 50.38977, Running average loss:25.92383, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 48300, loss(cross_entropy): 25.32853, Running average loss:27.55946, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 48475, loss(cross_entropy): 133.55482, Running average loss:29.77965, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 48650, loss(cross_entropy): 28.25405, Running average loss:32.10829, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 27/240:, Cur-Step: 48825, loss(cross_entropy): 70.33047, Running average loss:33.05974, Time taken: 0:00:27.701398 ETA: 1:38:20.397752 Epoch: 28/240:, Cur-Step: 49000, loss(cross_entropy): 15.48495, Running average loss:15.48495, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 49175, loss(cross_entropy): 16.13934, Running average loss:24.65855, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 49350, loss(cross_entropy): 14.73084, Running average loss:29.72679, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 49525, loss(cross_entropy): 49.73141, Running average loss:52.79896, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 49700, loss(cross_entropy): 16.86661, Running average loss:46.53542, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 49875, loss(cross_entropy): 42.73531, Running average loss:45.36191, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 50050, loss(cross_entropy): 44.67923, Running average loss:48.42287, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 50225, loss(cross_entropy): 23.00323, Running average loss:49.54516, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 50400, loss(cross_entropy): 53.61279, Running average loss:48.59221, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 28/240:, Cur-Step: 50575, loss(cross_entropy): 27.03015, Running average loss:50.77247, Time taken: 0:00:28.052039 ETA: 1:39:07.032248 Epoch: 29/240:, Cur-Step: 50750, loss(cross_entropy): 66.16262, Running average loss:66.16262, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 50925, loss(cross_entropy): 41.67230, Running average loss:87.92691, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 51100, loss(cross_entropy): 5.86561, Running average loss:51.02480, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 51275, loss(cross_entropy): 3.49822, Running average loss:35.78214, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 51450, loss(cross_entropy): 4.73060, Running average loss:28.63942, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 51625, loss(cross_entropy): 16.12980, Running average loss:24.70271, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 51800, loss(cross_entropy): 83.88812, Running average loss:24.45003, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 51975, loss(cross_entropy): 28.79474, Running average loss:30.31909, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 52150, loss(cross_entropy): 51.43472, Running average loss:33.54001, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 29/240:, Cur-Step: 52325, loss(cross_entropy): 86.96181, Running average loss:38.33317, Time taken: 0:00:27.721052 ETA: 1:37:29.142058 Epoch: 30/240:, Cur-Step: 52500, loss(cross_entropy): 17.17116, Running average loss:17.17116, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 52675, loss(cross_entropy): 10.65070, Running average loss:14.25161, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 52850, loss(cross_entropy): 3.73582, Running average loss:10.84330, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 53025, loss(cross_entropy): 13.77156, Running average loss:9.13255, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 53200, loss(cross_entropy): 3.37814, Running average loss:9.11530, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 53375, loss(cross_entropy): 8.01140, Running average loss:8.47991, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 53550, loss(cross_entropy): 6.30225, Running average loss:8.22207, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 53725, loss(cross_entropy): 16.71934, Running average loss:8.04378, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 53900, loss(cross_entropy): 5.24586, Running average loss:7.80042, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 30/240:, Cur-Step: 54075, loss(cross_entropy): 34.01215, Running average loss:8.12279, Time taken: 0:00:27.345923 ETA: 1:35:42.643819 Epoch: 31/240:, Cur-Step: 54250, loss(cross_entropy): 23.94794, Running average loss:23.94794, Time taken: 0:00:27.996190 ETA: 1:37:31.203775 Epoch: 31/240:, Cur-Step: 54425, loss(cross_entropy): 16.26011, Running average loss:22.75116, Time taken: 0:00:27.996190 ETA: 1:37:31.203775 Epoch: 31/240:, Cur-Step: 54600, loss(cross_entropy): 45.83377, Running average loss:26.37374, Time taken: 0:00:27.996190 ETA: 1:37:31.203775 Epoch: 31/240:, Cur-Step: 54775, loss(cross_entropy): 41.30719, Running average loss:30.49814, Time taken: 0:00:27.996190 ETA: 1:37:31.203775 Epoch: 31/240:, Cur-Step: 54950, loss(cross_entropy): 23.57254, Running average loss:34.15678, Time taken: 0:00:27.996190 ETA: 1:37:31.203775 Epoch: 31/240:, Cur-Step: 55125, loss(cross_entropy): 20.94315, Running average loss:32.16924, Time taken: 0:00:27.996190 ETA: 1:37:31.203775 Epoch: 31/240:, Cur-Step: 55300, loss(cross_entropy): 18.06967, Running average loss:32.03184, Time taken: 0:00:27.996190 ETA: 1:37:31.203775