Hello,
TensorRT is a tool to speed up neural networks inference.
I was wondering if there exists a nvidia tool to prune neural networks, in order to speed up inference and reduce the memory size.
Thanks
Hello,
TensorRT is a tool to speed up neural networks inference.
I was wondering if there exists a nvidia tool to prune neural networks, in order to speed up inference and reduce the memory size.
Thanks
this is specific to TensorFlow based model, consider graph surgeon:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/graphsurgeon/graphsurgeon.html