Tegra TX2 block digagram shows a compression/decompression unit inside the Pascal GPU block. Is it used for compressing data in deep learning inference ? Is there a CUDA API that triggers these units for a buffer/texture or is it transparent to the developer ?
The compression unit shown internal to the GPU is related to texture load/store formats (memory that is virtually contiguous may be physically in different memory banks for improved bandwidth or latency) and not particularly of concern to developers.
With deep learning on Jetson, the FP16 ALUs in the GPU are used. Those are used in CUDA kernels with the half, half2, ect. datatype. Deep learning frameworks like TensorRT and Torch/pyTorch also make FP16 tensor primitives available to use to take advantage of the acceleration.