Although it may seem like something out of a science fiction movie, today we are witnessing a real battle in the world of voice assistants, also known as virtual assistants or personal assistants. These assistants are capable of helping us with everyday tasks such as setting an alarm, sending an email as well as answering general knowledge questions and using a myriad of services, becoming the butlers of the 21st century. Does this conversation ring a bell?
HAL: I could sing a song for you.
Dave: Yes, I'd like to hear it, HAL. Sing it for me.
HAL: It's called "Daisy."
Yes, it's a conversation from 2001: A Space Odyssey, and it's no longer fiction but reality.
I am at your service
What are these assistants capable of doing? They help us with our day-to-day work by improving productivity, they are able to send emails, set appointments on the calendar, create alarms and timers through the clock and even help us write office documents.
They keep us up to date with our trips and journeys, manage our plane and train tickets, let us know when we have to leave home if we want to be on time for our appointment, tell us how long it will take to get to our destination by different means of transport and offer us the best routes taking into account the state of the traffic.
They are able to recommend us places of leisure and restoration, they offer us restaurants near us, they inform us of the sports results and they show us the progress of our daily activity through the information collected through the activity bracelets or the sensors of the mobile phone.
They offer us entertainment through recommendations of music, movies, series, where to watch them, buy them, and on what devices to play them. Even if we get bored they can have a conversation with us and play simple games.
They help us with our communication, assisting us to send messages through instant messaging or social networks and are able to translate many languages.
We can ask them to collaborate with our Smart Home, where through sensors and different devices we can control the lights, thermostats, sprinklers and security cameras.
They help us to solve general knowledge doubts, ask them about the capital of a country, or about recent events such as sports results, the voice assistants will provide you with an answer, although with certain limitations. They don't always understand the question or don't know how to answer it, so sometimes they end up taking us to a simple web search on the terms.
There are many software companies developing their voice assistants. All of them have something in common: they have invested heavily in artificial intelligence to learn from users and have launched their own Smart Home Hubs that allow us to talk to the assistants without the need to have our mobile phone or laptop nearby and connect to different sensors in our home.
We can classify the competitors in three categories:
- The big four of voice assistants, big companies that already have their assistants in the market with a significant volume of users, such as Google Assistant, Apple Siri, Amazon Alexa and Microsoft Cortana.
- The candidates, important companies that are about to launch their assistants and small companies that have been in the sector for a long time and that promise to give war to the first ones. In this group we include Samsung Bixby, which will soon be on the market, together with the absorbed Viv Labs, initial creators of Siri, Sony Xperia Agent still in the conceptual phase and Sherpa, an assistant created by a Spanish company that helps you without having to talk to it.
- The fighters in the open source world like Mycroft and Lucida who make powerful weapons available to developers to battle the great competitors
Below is a comparison of the different attendees, with their Smart Home Hubs and a description of their main strengths.
|Assistant||Smart Home Hub||Strengths|
|Google Assistant||Google Home||It's present in Android, which tops the telephony market and tries to increase its presence through Android Wear, Android TV, Android Auto and Android Things. Its great capacity of investment and its ecosystem of products and services places it in a privileged position.|
|Apple Siri||Apple TV||Like the previous one, it has a large volume of clients in mobile phones, but also has its own wearables, laptops, PCs, Apple TV, a legion of unconditional fans and a strong investment in R&D.|
|Amazon Alexa||Amazon Echo||Its main strength comes from strategic alliances with different device manufacturers, which can be easily integrated to take advantage of all Alexa's capabilities. Let's not forget that Amazon is the largest online store and therefore through Alexa you can make your purchases easily.|
|Microsoft Cortana||Invoke (Coming soon)||With the largest market share in PCs and laptops, the Cortana is the best positioned assistant for productivity tasks. Like Alexa, it will need to establish partnerships in order to spread the word about Cortana among major device manufacturers.|
|Samsung Bixby||SmartThings||Samsung's main potential is devices. Samsung is a manufacturer of TVs, cameras, home appliances, mobile phones and tablets. This is their main asset to position their voice assistant, and to be able to place themselves in the top positions.|
|Sony Xperia Agent||Sony Xperia Agent||Sony is positioned as a manufacturer of electronic devices, TVs, mobile phones, cameras, tablets, wearables and video consoles. Unlike Samsung, Sony has chosen to use Android on its TVs and smart watches. We will see if Sony presents itself as a strategic ally of Google, or if it plans to stand up and produce its own software to compete.|
|Sherpa||N/A||We highlight this assistant for having been implemented for the Spanish language, by a start-up from the Basque Country that has achieved a funding round of 6 million euros in 2016 and that made agreements with Samsung to have its app installed in the Samsung S7. We must follow closely this company to see how it evolves after Samsung launches its own assistant.|
|Mycroft||Mark I and Raspberry Pi||Originating from a crowdfunding campaign, Mycroft is a young company with a great capacity for innovation. The engine is in continuous development and as an open source project one of its best opportunities for growth is the collaboration of the community, to bring new ideas, new software and connectors to different devices.|
|Lucida||N/A||Created in the Clarity Lab of the University of Michigan, the main strength is to have not only voice recognition services, but also image recognition and the possibility of integrating your own services. Its development remains active as can be seen in the project's activity on Github.|
To see a comparison between the capacities of the big four you can check this Business Insider entry. If you want to see a more or less complete list of the assistants and their functionalities, you can visit the Wikipedia article on Voice Assistants.
How they work
Voice assistants have several components in common. Firstly, in order to understand spoken audio, it is necessary to have a speech recognition system, which allows them to transcribe the user's voice into text. This text is then processed by different language processing algorithms, in particular language understanding (NLU). Wizards try to syntactically analyze the sentence (parsing) to understand the user's objective and then connect to the appropriate service to launch the query. It is in the NLU component where all the magic happens, the most important part of the assistant and which we will analyze later in another post. Finally, the assistant composes a sentence from the result and synthesizes it so we can hear the answer.
And the winner is...
Without a doubt the winner is the end user, who benefits from fierce competition by being able to choose between different possibilities that can meet their needs.
What will the future bring? Well, it seems that we are already immersed in a connected world and as far as voice assistants are concerned there is still a long way to go. In particular, some of the trends being pursued are the introduction of a more conversational mode rather than individual commands, so that there is a prior context in which to speak naturally. Another of the tricks that should be played is the integration with more Internet of Things devices that allow to extend the functionalities, although we will see a new battle for formats and standards that we hope will last a short time and allow us to enjoy the unlimited capacities of these assistants.