One of the great things from being a subscriber to the Magpi magazine is the amount of quality equipment they give out for free. But to be honest I wasn’t expecting what came in the issue before last which was a whole voice kit consisting of the voice hat, speaker, microphone, arcade type button, cables and of course the cardboard box. Plus a ready made sd image.

Google-VoiceSim-Packaging-white-EDIT_web

Now I’ve been involved with voice recognition before on the raspberry pi, there have been niggly bits that made it unfeasible to program. Getting sox to work properly ie to start recording on voice level and then stop recording on reaching a pause was horrendous. You needed to wait until the file had been saved before sending to google for analysis which meant there was latency issues. I also tried using the chrome browser component but that had issues trying to use on a local web server without a certificate in that the browser would request each time to use the microphone.  However with this kit all these issues were pretty much sorted, in fact it exceeded all expectations.

As fate would have it I got roped into a hackathon competition where the requirements were as a team to create a working prototype of a voice assistant or other system such as OCR for the insurance industry and was to be completed in two days and presented on the third. But luckily I was able to build this gizmo over the weekend. And it had a name, “the lady box” given by one of the female members of the team.

And trying it out you have two modes – the first which uses the google assistant – which is pretty awesome in its knowledge and humour. I asked about the company that I work for and then followed up with “who is the CEO” to which it understood the context and got the answer right. When I asked “Do you love me?” the answer was “well, actually I do, in a service kind of way”. More scary was when I asked “what is my name?” it correctly answered and when I asked “how do you know?” the reply was “because you told me”. I eventually realized that the info had come from my google account. As the voice is created at google’s end this has top notch intonation as well. To be honest it didn’t get anything wrong though as would be expected it couldn’t understand absolutely everything. But things that it did seem to do well was its ability to pick up voice from across the room and seemed to be able to filter out noise on the television which was on at the same time. Unlike amazon echo which uses the key phrase “Alexa” the method of waking this up is either to press the button on the top or else by a single clap. It would be good if someone programs other triggers in the near future.

The other mode uses Google cloud services which allows you to program actions depending on recognition of keywords. To this purpose our team designed a program that would allow the customer to request information about their insurance in general, request information on their policy and change elements of their policy such as the vehicle insured or the no claims bonus. The quality of the recognition is as good as the other mode however the drawback is that the quality of the text to speech isn’t as good as it is created locally. We cut a few corners by not keeping a question context state, which meant that in order to change the vehicle on their policy they would need to say “my new vehicle registration is CE690 X0P” as just saying the registration wouldn’t fit into the keyword system of the code. However this is something that could easily be enhanced given time.

So come the day of the presentation and the team had worked hard on getting a lot of facts and figures and info together on how to market and develop a system for the insurance industry. And against my better judgement we were going to do a live demo. Though I insisted on doing a backup video, in case of major technical problems. The person taking the part of the user has a French accent but this didn’t seem to cause any issues. And everything went swimmingly until the very last command which was “add breakdown cover” to which it picked up “aaa breakdown cover”. But never mind – the judges seemed to be impressed with both the presentation and demo and we came first on the popular vote (second in the overall competition).

AIY Voice Kit with Machine Learning