I would like to do some custom audio classification using python.
Is there any tutorial of it using Nvidia nano?
If training is not possible, I can make an onnx file somewhere. And is there any tutorial of how to import the onnx and do the inference? How to get audio from the IP cam?
Do you want to find a speech recognition sample for Jetson?
If yes, it’s recommended to check the sample below:
It is a pre-trained model I assume. And I cannot re-train it like object detection, right?
My application is simpler, just need to distinguish several different sounds, don’t need to convert to text.
Still looking for a way to get the audio from IP cam through RTSP protocol…
If there is a sample python code getting the RTSP audio and do the inference, it will be great.
(just like dusty-nv object detection)
And how to get the rtsp audio?
I found this in the forum
How to install PyAudio? (L4T 32.2.3) - Jetson & Embedded Systems / Jetson Nano - NVIDIA Developer Forums
But it got the audio from wav, not rtsp audio stream…
Is there any conversion in Jetson from rtsp audio to a virtual mic?
To summarize what I have done.
I can use my USB microphone to do sound classification. (using tensorflow)
How to convert the IP camera audio (rtsp) to virtual microphone?
If it works, I will be happy.
Unfortunately, we don’t have a sample that deals with the RTSP audio input.
But there is a sample in Deepstream that use audio file input.
Since Deepstream is a GStreamer-based component, maybe it will be easier to replace the source with RTSP audio input.
Mmm… I have bad experience with Deepstream before…
Plan B: Is there anyway to convert rtsp audio to a virtual microphone? thx
There is a sample to use RTSP mp4 streaming to extract the soundtrack.
Could you check if the following sample works for you?
$ cd /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-audio/configs # set rtsp stream in ds_audio_sonyc_rtsp_test_config.txt $ deepstream-audio -c ds_audio_sonyc_rtsp_test_config.txt