Is it possible that Object Detecting with STT?

Is it possible that object detection with Speech To Text(STT) from a input video which passes through deepstream??