DLSS uses tensor cores and can do 4x upscaling from 1080p to 4k in just 1.5ms, so how is it possible? And this happened on low tier GPU, like RTX 3060.
Please check the below links, as they might answer your concerns.
Is it DLA supported on RTX 30XX GPU and Quadro A6000?
Because then I try to convert simple test model I get error on Quadro A6000 from docker nvcr.io/nvidia/tensorflow:21.05-tf2-py3:
trtexec --onnx=/weights/onnx/model.onnx --saveEngine=/weights/onnx/model-rt.trt --explicitBatch --fp16 --optShapes=input:0:8x256x256x3 --workspace=35000 --threads --dumpProfile --noBuilderCache --useDLACore=0 --allowGPUFallback --verbose
[07/27/2021-23:02:51] [E] Cannot create DLA engine, 0 not available [07/27/2021-23:02:51] [E] Engine creation failed [07/27/2021-23:02:51] [E] Engine set up failed
Please refer Support Matrix :: NVIDIA Deep Learning TensorRT Documentation to check DLA support.
So, only Jetson AGX Xavier has DLA block but not RTX desctop GPU. And my question not answered, how is it possible what DLSS is so fast?
But I looks on it from AI perspective to and curios how its work under the hood. So basically speed is a king of DLSS but how you optimize it?