Retrieve audio sample from NvAudioBuf

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version 6.0
• JetPack Version (valid for Jetson only) 4.6.1

Hello guys,

I’m trying to have an audio application running in python.
Until now I was abble to construct the python bindings to define the NvDsAudioFrameMea, NvBufAudio and NvBufAudioParams.

Similar to what we use to retrieve the frame from a vision app → get_nvds_buf_surface function, I try to wrote my on function to retrieve the audio sample.

        m.def("get_nvds_buf_audio",
              [](size_t gst_buffer, int batchID) {
                  auto *buffer = reinterpret_cast<GstBuffer *>(gst_buffer);
                  GstMapInfo inmap;
                  gst_buffer_map(buffer, &inmap, GST_MAP_READ);
                  auto *inputnvsurface = reinterpret_cast<NvBufAudio *>(inmap.data);
                  gst_buffer_unmap(buffer, &inmap);
                  
                  // int size = sizeof(inputnvsurface->audioBuffers) / sizeof(NvBufAudioParams);
                  // std::cout << "num elemnts: " << size << '\n';
                  // std::cout << "BufAudio batch size: " << inputnvsurface->batchSize << "\n";


                  auto input_surface = inputnvsurface->audioBuffers[batchID];
                  // std::cout << "0 audio buf with batch id: : " << batchID << "\n";
                  // std::cout << "0 audio buf: : " << input_surface.duration << "\n";
                  
                  uint32_t rate = input_surface.rate;
                  uint32_t channels = input_surface.channels;
                  uint32_t bpf = input_surface.bpf;
                  uint32_t data_size = input_surface.dataSize;
                  uint64_t duration = input_surface.duration;
                  auto dtype = py::dtype(
                          py::format_descriptor<unsigned char>::format());

            //       std::cout << "BufAudio num filled: " << inputnvsurface->numFilled << "\n";
            //       std::cout << "BufAudio batch size: " << inputnvsurface->batchSize << "\n";
            //       std::cout << "BufAudio iscontiguius: " << inputnvsurface->isContiguous << "\n";


            //      std::cout << "AudioParams datasize: " << data_size << "\n";
            //      std::cout << "AudioParams bufPTS: " << input_surface.bufPts << "\n";
            //      std::cout << "AudioParams Duration: " << duration << "\n";
            //      std::cout << "AudioParams rate: " << rate << "\n";
            //      std::cout << "AudioParams channels: " << channels << "\n";
            //      std::cout << "AudioParams bpf: " << bpf << "\n";


                  return py::array (dtype,
                                   {data_size,},
                                   {sizeof(unsigned char) * 8},
                                   (const unsigned char *) input_surface.dataPtr,
                                   py::cast(input_surface.dataPtr)
                  );

              },
              "gst_buffer"_a,
              "batchID"_a,
              py::return_value_policy::reference,
              pydsdoc::methodsDoc::get_nvds_buf_surface
        );

I was only able to retrieve a very small amount of data.

Can someone help me out with this?

In the documentation the NvBufAudio.batchSize and NvBufAudio.numFilled as exactly the same description.

NvBufAudio.batchSize value is always 1, but NvBufAudio.audioBuffers can be accessed in other indexes.

The data is not clear. Should we iterate over the NvBufAudio.audioBuffers? Is for that reason that the NvBufAudioParams.dataSize is always 2520?
What is the NvBufAudioParams.duration units? an example of a value is 28571428.

Hope someone can help me with this! Thanks in advance!

It is just normal PCM data Pulse-code modulation - Wikipedia, you know the sample rate, the duration, data type and the bytes per frame, everything is already there. NVIDIA DeepStream SDK API Reference: NvBufAudioParams Struct Reference

What do you mean by “a very small amount of data”?

The iterate number is NvBufAudio.numFilled.

NvBufAudioParams.duration is the duration time in ms.

The data size is the audio frame size, if the data type, sample rate and duration are fixed, the frame size is fixed too.