TensorRT INT8 Dual head

Hi,

Not sure if this is the right board, but here goes. At https://github.com/glinscott/leela-chess we’re working on an open source chess engine based on the published work from DeepMind: AlphaZero and AlphaGoZero. Since we are on the edge of training our first (relatively small) neural network through millions of selfplay games, I wanted to investigate the use of TensorRT.

I skimmed the API documentation at http://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html and my main question is whether this is possible given our neural network architecture which consists of 64 filters and 6 residual blocks with 2 output heads:

Policy Head

  • A convolution of 32 filters of kernel size 1 ×​ 1 with stride 1
  • Batch normalization
  • A rectifier nonlinearity
  • A fully connected linear layer that outputs a vector of size 1924
  • Value Head

  • A convolution of 32 filters of kernel size 1 ×​ 1 with stride 1
  • Batch normalization
  • A rectifier nonlinearity
  • A fully connected linear layer to a hidden layer of size 128
  • A rectifier nonlinearity
  • A fully connected linear layer to a scalar
  • A tanh nonlinearity outputting a scalar in the range [−​1, 1]
  • Our model is trained using tensorflow. All the TensorRT examples use a single head and I was wondering how portable is the TensorRT solution?