I’m developing a experimental style transfer shader for Unreal Engine 4. Target platform would mainly be windows. Since I see TRT 3 would add windows support, I plan to use it as inference engine.
Questions in mind:
-
What’s the expected release date for TRT3 ?
-
Since TRT does a few graph optimizations, does it offer performance advantage over hand-optimization ? I would try to hand-optimize cuDNN / cuBLAS calls, but not low-level CUDA kernels.
-
How does TRT manages memory ? I don’t want high memory consumption as there are other things going inside a game engine.
-
Is there any good alternative to using cuDNN | TRT in shaders ? Using CUDA-OpenGL interop locks API dependence. It would be good if this can be directly done in OpenGL / Vukan compute shaders, however I’m not sure about performance / development time.