TensorRT 3 questions

I’m developing a experimental style transfer shader for Unreal Engine 4. Target platform would mainly be windows. Since I see TRT 3 would add windows support, I plan to use it as inference engine.

Questions in mind:

  1. What’s the expected release date for TRT3 ?

  2. Since TRT does a few graph optimizations, does it offer performance advantage over hand-optimization ? I would try to hand-optimize cuDNN / cuBLAS calls, but not low-level CUDA kernels.

  3. How does TRT manages memory ? I don’t want high memory consumption as there are other things going inside a game engine.

  4. Is there any good alternative to using cuDNN | TRT in shaders ? Using CUDA-OpenGL interop locks API dependence. It would be good if this can be directly done in OpenGL / Vukan compute shaders, however I’m not sure about performance / development time.