Moving from Engineering to Orchestrating Conversations

Tsung-Hsien Wen image
Tsung-Hsien Wen

21 Jan 2019 - 7 minutes read

The idea of engineering chatbots has failed us for way too long. It’s time to embrace a new philosophy and start orchestrating conversations, through data! At PolyAI, it has been the cornerstone of an approach which has empowered us to innovate and challenge the status quo.

Ever since the imitation game was proposed by Turing in the 1950s, talking to machines has become a common scene in many movies and TV shows. Despite all the excitement over the past decades, robust task-specific dialogue agents have not become truly popular — and we are still very far from general-purpose conversational AI that can serve as a ubiquitous point of contact for all services. This may sound disappointing — an omniscient, all-powerful intelligent assistant is still very far from our daily lives, even after the recent deep learning renaissance and the fourth industrial revolution.

A virtual assistant like Samantha in “Her” is what we’ve all been waiting for. Courtesy to Warner Bros.

So — where are we in terms of creating powerful conversational interfaces? If you read the Chatbots Magazine, you’ve seen predictions like: 1) conversational interfaces might replace existing apps and websites to become the predominant interface for human-machine interactions; 2) chatbots were predicted to power 85% of all customer service interactions by year 2020; 3) chatbots will be responsible for cost savings of over $8 billion annually by 2022, up from $20 million in 2017. If you take a quick look at these reports and the graphs displaying the projected growth of messaging platforms, you might even think that chatbots have by now become a real thing, and you might consider building one yourself.

However, if you scroll through Facebook Messenger and engage with some of the hottest chatbots, conversations like this ensue:

All in all, chatbots just don’t live up to the standards any user would hope for.

So, Why Do Chatbots Suck?

There are so many reasons why chatbots and conversational AI don’t meet our expectations. Headlines like “How Bots Will Completely Kill Websites and Mobile Apps” deliver plenty of hype, but are not yet supported by the state of current technology. Users inevitably face disappointing interactions when trying chatbots. Although there are many other issues plaguing chatbot development, from integration to early adoption, what I want to talk about today is more fundamental: “they are simply not good enough!

Don’t get me wrong, I think we have a good understanding of the building-block technologies of conversational AI. We know that the language understanding component should be divided into intent classification and slot-filling; we know that intent classification can be posed as ranking or as a multi-class classification problem, and one should be chosen over the other for certain use cases. We know that combining machine learning with existing parsers can dramatically decrease false positives for semantic parsing. We know that neural network embeddings and keyword matching (or TF-IDF) are complementary techniques — there are pros and cons for both approaches. We also know that Reinforcement Learning and Generative Models are at present not even remotely good enough for real-world conversational AI applications.

The industry has been failing mainly due to the way the different components and technologies are typically pieced together. Conversational AI is not just a new app, or a new website for a new era, but a completely different modality of human-machine interaction. This new experience cannot be simply replicated through a process of engineering via scripting and hand-crafting.

Don’t ENGINEER Conversations. It Doesn’t Work!

Contemporary chatbot development relies on decision-tree style conversational flow design. This principle comes in handy as it allows us to discretize complex and subtle conversational phenomena into distinctive, programmable subroutines, allowing us to “engineer” the interaction with the chatbot. However, once you try some of the popular bot platforms, you will realize how difficult it is to build a good chatbot by following this principle.

A decision tree, rule-based conversational agent for booking an Uber ride

The fundamental limitation of decision-tree style chatbot engineering is its anti-human nature. As we grow up and learn to speak, the way we form sentences and choose words becomes internalized as subconscious behavior. However, hand-crafting chatbots using decision trees means we have to deconstruct our subconscious language model and use an alternative, programmatic approach, where we have to in advance account for the variety of ways that users can ask for something. The designer has to anticipate all the paraphrases that users may use, and to do that they have to account for all the words that could have been misrecognized by the speech recognition system. And to rectify these mistakes, the designer has to fully understand how these models could have misrecognized the words to begin with.

An attempt to somehow address this cognitive overload has led to an over-obsession with hand-crafting in order to improve modular functionalities such as classifying intents or recognizing keywords. Such over-engineering led to a complete breakdown of the end user experience — designers scramble to cover all the “happy paths”, but to no avail, as there are millions of outcomes that a conversation can have. This means that existing chatbots fail both in terms of the user experience and their core functional requirements.

Orchestrating Conversations with PolyAI

At PolyAI, we have a very different philosophy when it comes to building conversational AI. Despite the fact that this “engineering” approach has many shortcomings, it does have its advantages over pure data-driven approaches, such as better control of bot behavior and shorter development cycles. Adopting the lessons learned from this industry practice combined with years of research on understanding the machine learning involved in conversational AI, we developed the PolyAI platform, a radically different solution to existing bot building frameworks. The core idea behind this platform is the principle of “orchestrating” conversations, instead of over-engineering them.

The core technology behind the PolyAI platform is a general-purpose conversational search engine, and a solution we call content programming. Our conversational search engine was trained on billions of past conversations to learn to resolve context, identify important social cues in dialogues, and select the most plausible response from a pool of possible answers. Once the system is deployed, it also learns as it interacts with users to master each individual domain. Since it has seen many conversations in the past, it can naturally generalize better than most of the application-specific intent classifiers or semantic parsers offered by other bot frameworks.

Content programming, on the other hand, is a proprietary engineering framework we developed at PolyAI — it allows bot builders to alter behaviors by editing content instead of crafting rules. The goal of content programming is not to give bot builders the dictatorial power to overrule the search engine’s decision whenever she sees fit, but to empower the engine with application-specific data examples so it can make a collectively wiser decision.

At PolyAI, we believe in data. That’s why we strive to push as much of the bot’s decision-making into our data-driven search engine so we can empower it to choose the right response. In this way, bot builders can be released from the process of predicting and handcrafting thousands of possible dialogue flows. Instead, they can focus on user experience by using content curation to craft what matters most: the content of the responses that can be offered to their users. Under this concept, the bot builder is effectively the conductor of an orchestra — she “instructs” the orchestra (conversational search engine) to produce the symphony (conversation) that the audience (users) will enjoy. Rather than micro-managing each member of the orchestra, she is tasked with the big picture — the delivery of the symphony itself (the overall user experience). She should not worry about each individual musician’s performance (i.e., specific chatbot behavior) — as long as they are all contributing to a satisfying end result.

Orchestrating  Means  Having More Faith in Data

I’m a strong advocate of machine learning and data-driven approaches. Over years of work in this field, I saw the remarkable things that machines can do given a huge amount of data — but I also witnessed its fragility when it lacks the fuel (data!). Since machines can browse billions of conversations at a fraction of the time that humans would need, we should believe that machines can design a better conversational flow than us, instead of relying on our already-internalized, subconscious model of language. Before General Artificial Intelligence is invented (which may take decades or even more), orchestration is a principle worth considering when it comes to Conversational AI development. At PolyAI, it has been the cornerstone of an approach which has empowered us to innovate and challenge the status quo.

PolyAI is a world-leading provider of conversational AI solutions for contact center automation. Our platform empowers contact centers, allowing them to deliver the next-generation of customer experience at a fraction of the cost paid by their competitors. At the moment, the PolyAI platform is proprietary and we only allow access to select partners. If you are interested in learning more about it, please get in touch at: contact@poly-ai.com

How can we help?

contact us