How to get audio input and output while using jetson to inference on video?

We are using jeston to do image inference on video input, and the video is passed on to display together with the inference result. Our video input comes from a source passed to the jeston outside of it, and is displayed onto a screen. The problem is that, the jetson.utils APIs we are using do not seem to deal with audio input anywhere. We still need the audio to be displayed together with the video. How can we get access to audio in a single program?

Also, for issues of implementation, we can’t use deepstream which seems to be able to handle audio together with video input.