TensorRT INT8 Dual head

Error323 · January 28, 2018, 8:39pm

Hi,

Not sure if this is the right board, but here goes. At GitHub - glinscott/leela-chess: **MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero we’re working on an open source chess engine based on the published work from DeepMind: AlphaZero and AlphaGoZero. Since we are on the edge of training our first (relatively small) neural network through millions of selfplay games, I wanted to investigate the use of TensorRT.

I skimmed the API documentation at Developer Guide :: NVIDIA Deep Learning TensorRT Documentation and my main question is whether this is possible given our neural network architecture which consists of 64 filters and 6 residual blocks with 2 output heads:

Policy Head

A convolution of 32 filters of kernel size 1 × 1 with stride 1

Batch normalization

A rectifier nonlinearity

A fully connected linear layer that outputs a vector of size 1924

Value Head

A convolution of 32 filters of kernel size 1 × 1 with stride 1

Batch normalization

A rectifier nonlinearity

A fully connected linear layer to a hidden layer of size 128

A rectifier nonlinearity

A fully connected linear layer to a scalar

A tanh nonlinearity outputting a scalar in the range [−1, 1]

Our model is trained using tensorflow. All the TensorRT examples use a single head and I was wondering how portable is the TensorRT solution?