Intelligent Bat Detector

Bats emit echo location calls at high frequencies to enable them to ‘see’ in the dark and we humans need to be able to monitor the species to help prevent destruction of their environment during our ever increasing exploitation of the planet. Recording and analysing bat call audio is a great way to achieve this monitoring, especially if it can be fully automated.

This particular bat detector device can be run off 12 volt batteries and be deployed in the wild for days / weeks at a time with data being transmitted every now and again via a LoRa radio link (RPI only). Essentially, it records 30 second chunks of audio, does a quick analysis of that audio using machine learning, and then renames the audio file with species, confidence and date if it detected a bat. All other recordings are automatically deleted to save disc space and time for the biologist / researcher.

The core ingredients of this project are:

What have been the key challenges so far?

  • Choosing the right software. Initially I started off using a package designed for music classification called ' PyAudioAnalysis' which gave options for both Random Forest and then human voice recognition Deep Learning using Tensorflow. Both systems worked ok, but the results were very poor. After some time chatting on this very friendly Facebook group: Bat Call Sound Analysis Workshop , I found a software package written in the R language with a decent tutorial that worked well within a few hours of tweaking. As a rule, if the tutorial is crap, then the software should probably be avoided! The same was true when creating the app with the touchscreen - I found one really good tutorial for GTK 3 + python, with examples, which set me up for a relatively smooth ride.
<li>Finding quality bat data for my country. In theory, there should be numerous databases of full spectrum audio recordings in the UK and France, but when actually trying to download audio files, most of them seem to have been closed down or limited to the more obscure 'social calls'. The only option was to make my own recordings which was actually great fun and I managed to find 6 species of bat in my back yard. This was enough to get going.</li>

<li>Using Gtk 3 to produce the app. Whilst python itself is very well documented on Stack exchange etc, solving more detailed problems with Gtk 3 was hard going. One bug was completely undocumented and took me 3 days to remove! The software is also rather clunky and not particularly user friendly or intuitive. Compared to ordinary programming with Python, Gtk was NOT an enjoyable experience, although it's very rewarding to see the app in action.</li>[img]https://cdn.hackaday.io/images/8400571581089194451.jpg[/img]
  • Designing the overall architecture of the app - Gtk only covers a very small part of the app - the touch screen display. The rest of it relies on various Bash and Python scripts to interact with the main deployment script which is written in R. Learning the R language was really not a problem as it's a very generic languages and and only seems to differ in it's idiosyncratic use of syntax, just like any other language really. The 'stack' architecture initially started to evolve organically with a lot of trial and error. As a Hacker, I just put it together in a way that seemed logical and did not involve too much work. I'm far too lazy to learn how to build a stack properly or even learn any language properly, but, after giving a presentation to my local university computer department, everybody seemed to agree that that was perfectly ok for product development. Below is a quick sketch of the stack interactions, which will be pure nonsense to most people but is invaluable to remind myself of how it all works:
  • [img]https://cdn.hackaday.io/images/7482881581167935194.png[/img]
    <li>Creating a dynamic barchart - I really wanted to display the results of the bat detection system in the most easy and comprehensive way and the boring old barchart seemed like the way forwards. However, to make it a bit more exciting, I decided to have it update dynamically so that as soon as a bat was detected, the results would appear on the screen. Using spectograms might have been ok, but they're quite hard to read on a small screen, particularly if the bat call is a bit faint. After ten days of trial and error, I got a block of code working in the R deployment script such that it produced a CSV file with all the correctly formatted labels and table data that were comprehensible to another Python script using the ubiquitous matplotlib library to creating a PNG image for Gtk to display. The crux of it was getting the legend to automatically self initialise otherwise it would not work when switching to a new data set. Undoubtedly, this has saved a whole load of trouble in the future.</li>
    
    <li>Parallelism - some parts of the stack, most particularly the recording of live audio, has to be done seamlessly, one chunk after another. This was achieved in Bash using some incredibly simple syntax - the & character and the command wait. It's all done in two very neat lines of code:
    
    arecord -f S16 -r 384000 -d ${chunk_time} -c 1 --device=plughw:r0,0 /home/tegwyn/ultrasonic_classifier/temp/new.wav &
    wait
    
     <li>Choosing to use the Bash environment for recording audio chunks was a bit of a no brainer due to the ease of use of the Alsa library and it's ability to record at 384 ks per second. I did not even consider the possibility of doing this any other way. More recently, I realised that some parts of the stack needed to be linear, in that blocks of code needed to run one after the other, and other blocks needed to run concurrently. This was most obvious with the deployment of the Random Forest models in that they only needed to be loaded into memory once per session rather than loading them into memory every time a classification was required. It was actually quite fun to re-organise the whole stack, but required that I documented what every script did and thought really carefully how to optimise it all. The different parts of the stack, written in different languages, communicate with each other by polling various text files in the 'helpers' directory which very often don't even have any contents!</li>
    
    <li>Finding a decent battery to 5V switching regulator and fuel gauge - It's quite amazing - nobody has yet created a compact 5V power supply that can both monitor the battery state of charge AND deliver a steady 5V from a lead acid battery AND work at a frequency above 384 kHz. Fortunately, after pouring over various datasheets for a day or two, I found one chip made by Monolithic that seemed to meet all the specs. And even more fortuitous was that the company supplied a nice evaluation board at a reasonable price that did not attract customs and handling fees from the couriers. Well done Monolithic  - we love you soooooo much! After running a full CPU and GPU stress test for 10 minutes, the chip temperature was only 10 degrees above ambient.</li>
    
    <li>Optimising the code for minimal power useage and minimal SD card stress - This involved completely redesigning part of the stack such that the classification scripts, written in R, became asynchronous, which means that, on pressing the start button, the script runs in a continuous loop ever waiting for a new .wav chunk to appear in the 'unknown_bat_audio directory'. The advantage in doing this is that the first part of the script can be isolated as a 'set-up' block which loads all the .rds model files into memory in a one off hit, rather than having to constantly do this for every audio chunk created.</li></ul>
    

    Features:

    • Full spectrum ultrasonic audio recording in mono at 384 ks per second.
    • Results can be displayed in real-time with 30 second delay in either text or spectogram or bar chart format.
    • Runs off a 12 V battery or any power supply from 6 to 16 V.
    • Software is optimised for power saving and speed.
    • Average battery life is about 5 hours using 10 x 1.2 V LiMH AA batteries.
    • Automatically classifies the subject in a choice of resolutions eg animal / genus / species.
    • Retains data even if it is only 1% confident up to set limit eg 5 GB and then starts deleting the worst of it to prevent data clogging.
    • Batch data processing mode can be used for re-evaluating any previous data or new data from other sources.
    • Open source software: https://github.com/paddygoat/ultrasonic_classifier.
    • New models for new geographical zones can be trained using the core software.
    • Data is transmitted to the cloud via LoRa.

    DEMO VIDEO: https://youtu.be/-FsXWqGqNaE

    Updates:

    We recently got LoRa working on the Nano by using a special script to initiate SPI and Adafruit circuit python Tiny LoRa library. Watch this space for more info !!

    From Random Forest to Inception V3

    Now to instigate Deep Learning. As opposed to the Random forest method, using a fully convoluted network such as Google’s Inception does not require features to be extracted using custom coded algorithms targeted to the animal’s echo location voice. The network works it all out on it’s own and a good pre-trained model already has a shed load of features defined which can be recycled and applied to new images which are completely unrelated to the old ones. Sounds too good to be true?

    To start with, I was very sceptical about it being able to tell the difference between identical calls at different frequencies, which is important when trying to classify members of the Pipistrelle genus. Basically, calls above 50 KHz are from Soprano Pips and calls under 50 KHz are Common Pips (There are other species in this genus, but not where I live!). So can the network tell the difference? The answer is both yes and no since we are forgetting one major tool at our disposal - data augmentation.

    Data augmentation can take many different forms such as flipping the image either horizontally or vertically or both, to give us 4x more data. Great for photos of dogs, but totally inappropriate for bat calls ! (bats never speak in reverse or upside down). Scaling and cropping is also inappropriate as we need to keep the frequency axis in tact. Possibly the only thing we can do is move the main echo-location calls in the time axis … so if we have a 20 msec call in a 500 msec audio file we could shift that call sideways in the time frame as much as we wanted. I chose to shift it (about) 64 times with some simple code to create a ‘sliding window’. The code uses the Bash ‘Mogrify’ command which, strangely, only works properly on .jpg images.

    Essentially, it involves 8 of these:

    # Convert all .png files to .jpg or else mogrify wont work properly:
        ls -1 *.png | xargs -n 1 bash -c 'convert "$0" "${0%.png}.jpg"'
        # delete the .png files:
        find . -maxdepth 1 -type f -iname \*.png -delete
    
        # Now split up all the 0.5 second long files into 8 parts of 680 pixels each:
        for file in *
        do
            fname="${file%.*}"
            
            mogrify -crop 5500x680+220+340 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname".jpg
    
            # This produces image 1 of 8:
            cp "$fname".jpg "$fname"_1.jpg
            mogrify -crop 680x680+0+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_1.jpg
            # This produces image 2 of 8:
            cp "$fname".jpg "$fname"_2.jpg
            mogrify -crop 680x680+680+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_2.jpg
            # This produces image 3 of 8:
            cp "$fname".jpg "$fname"_3.jpg
            mogrify -crop 680x680+1360+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_3.jpg
            # This produces image 4 of 8:
            cp "$fname".jpg "$fname"_4.jpg
            mogrify -crop 680x680+2040+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_4.jpg
            # This produces image 5 of 8:
            cp "$fname".jpg "$fname"_5.jpg
            mogrify -crop 680x680+2720+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_5.jpg
            # This produces image 6 of 8:
            cp "$fname".jpg "$fname"_6.jpg
            mogrify -crop 680x680+3400+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_6.jpg
            # This produces image 7 of 8:
            cp "$fname".jpg "$fname"_7.jpg
            mogrify -crop 680x680+4080+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_7.jpg
            # This produces image 8 of 8:
            cp "$fname".jpg "$fname"_8.jpg
            mogrify -crop 680x680+4760+0 /media/tegwyn/Xavier_SD/dog-breed-identification/build/plecotus_test_spectographs/test/"$fname"_8.jpg
    
            i=$((i+1))
            
        done
    

    The final code is a bit more complicated than this, but not by much!

    Suddenly we’ve got about 64x the amount of data. But what do the images contain? What if they contain the gaps between calls - ie nothing? … So now the images needed to be inspected one at a time to make sure that they actually had relevant content … All 30,000 of them! At a processing rate of about 2 per second, this took about 4 hours. That’s 4 hours of sitting in front of a screen pressing the delete and forward arrow. Was it worth it?

    I’m not going to go through the process of setting up the software environment for training the Inception classifier using Tensorflow on an Nvidia GPU as most probably, by the time I’ve finished typing it out, it will have changed. I used to be able to use my Jetson Xavier to train on Nvidia’s DetectNet, but guess what? … Yes, the software dependancies changed slightly and the system won’t run without unfathomable critical errors.

    Also, it’s the age old thing, there’s dozens of tutorials for classifying images and 95% of them are incomplete, irrelevant, out of date or simply dont work. After a lot of code skim reading a bit of trial and error, I settled on THIS ONE.

    What’s great is that the data is really well organised in a very simple manner and the code is well documented and easy to use. Only 2 main files: retrain.py and classify.py. It’s supposed to classify dog breeds, but works perfectly on bat spectograms as well!

    Throw all the species into their own folder, label the folder with the species, chuck the labelled folders into the ‘dataset’ folder, delete all the dog stuff and run the retrain.py script. Very simple. After training, find the ‘retrained_labels.txt’ and change the first few lines according to bat species after a few tests. Test the classifier on fresh data by putting it into the ‘test’ folder and run the classifier.py script. During training, even the Tensorboard function worked:

    Now test a batch of spectograph images, preferably unseen by the training:

    Fortunately, the classifier worked properly on the Pipistrelle species, correctly classifying all the Sopranos as sopranos. What’s interesting is that common pips is a close second, and sometimes very close, which is exactly as we would expect. Great - the system is working!

    4G LTE modem Enabled !!

    The Sierra Wireless EM7455 is a high end cat 6 4G LTE modem that has a whole load of yummy features such as 300 mb per sec download and 50 mb per sec upload speeds … Here’s the FULL SPEC. In the past I have used some 3G and 2G modems, but only every got them to work in GPRS mode, which is fine for sending lots of text via some rather insecure methods such as ‘get’ or ‘post’ but not good for uploading spectograph image files to an Amazon server, for example. Another great feature of this device is that’s it’s extremely compact and slots nicely into a M.2 key B connector.

    The device requires a USB carrier board to connect to the Raspberry Pi or Jetson Nano and there’s a few possibilities here although we opted for the Linkwave version and bought a high quality antenna on a 10 metre cable at the same time. Ok, it was expensive, but eventually, the results are worth it as being able to send images quickly means less battery juice being consumed.

    Connecting to the Raspberry Pi was just a matter of installing ‘network manager’ and creating a modem connection with the correct APN settings. For my network Three in the UK, the APN was ‘3Internet’ with no password or username. Simple! Getting functionality with the Jetson Nano was a different matter and required doing a live probe on the system drivers being used in the Raspberry pi using:

    tail -f /var/log/syslog
    

    … run in command line. Eventually i worked out that the most essential driver was qcserial, which is short for ‘Qualcomm serial modem’, which then had to be enabled in the Jetson Nano kernel … So with a fresh 128 Gb SD card I flashed the Nano from a host computer using the latest Nvidia SDK Manager package, expanded the file system form 16 Gb to 128 Gb and started messing with the drivers using these these scripts from JetsonHacks: https://github.com/JetsonHacksNano/buildKernelAndModules

    17,300 Spectograms Cant be Wrong !

    Although the current software stack can automatically generate thousands of spectographs from almost nothing, each one of them has to be manually checked by eye of human. I tried to train my dog to do this for me, but it just cost me a whole load of German sausage for nothing.

    Here’s few example of auto generated spectographs for Daubenton’s bat ready for training. Each one takes about 0.5 seconds to check by eye:





    1 Like