Moneyball, Statistics & Digital Assistants

Baseball players in various poses

Statistical analysis revolutionized the game of baseball, to the extent that every single team in the major league now employs a huge stats team that drives almost every decision the organization makes. The days of a “good eye” and a “gut feel” being the deciding factor are long over. Now, it’s all about the numbers. 

And the parallels to the world of digital assistants are remarkably close. Successful implementation of a digital assistant solution entails applying advanced statistical techniques to both measure and improve performance. The premise being:

“You can’t manage what you can’t measure”

– Peter Drucker

So, let’s start with the concept, and then examine the details behind the ways that the worlds of baseball and digital assistants collide.

1.   Moneyball

The lesson from Moneyball was that smart organizations could compete with the likes of the Yankees if they used their limited financial resources in ways that were much more efficient. If they could generate a hundred runs a year by spending $1M, while the Yankees were spending $10M, then they would level the playing field. Doing more with less was possible if organizations could change their traditional ways of thinking. 

And so it is true in the world of digital assistants. If somebody offered you something ten times better than what you already were paying for, and it happened to also be twenty times cheaper (at least), wouldn’t it make sense to switch to it? 

Human vs. Ida support performance
Figure 1: The new reality

This is the new world we live in. Humans are fantastic are doing certain types of things, and woefully inefficient at doing others. And digital assistants just happen to be great at what humans are poor at. Humans can’t infinitely scale, they also can’t remember thousands of key facts, or update hundreds of records in a few seconds. Humans forget things, are not always fully motivated, and no matter how much you invest in them, humans will eventually leave you. 

Digital assistants are the new Moneyball. If you choose the right one, and pay heed to how you implement and grow your new digital worker, it will allow your organization to provide a better service at a tiny fraction of the cost you are paying now. 

Moneyball cartoon: digital assistants are better and cheaper
Figure 2: Moneyball for the Enterprise & Campus

2.   What is a good accuracy score?

It has been said that hitting a ball in major league baseball is the hardest thing to do in all of sports. Yes, hitting a gently tossed baseball in your back yard is something almost anyone can do. But hitting a 92 mph four-seam fastball from Clayton Kershaw? Well, that is something very few people can do. 

In the world of baseball, making contact with the ball is considered job #1 of the batter. In the world of digital assistants, being able to accurately respond to a human request is likewise considered job #1. 

In baseball, a batting average of over .300 is considered excellent, over .350 is elite, and anything close to .400 is other-worldly. With digital assistants, over .800 is excellent, .850 is elite, and .900 is other-worldly. 

Note: there are caveats to this that we will address is subsequent points.

In the following chart, extensive testing (2,500 questions) of the major consumer facing digital assistants took place, where voice recognition was removed as a factor such that this was purely a test of knowledge matching. What was even more interesting was that the test also measured how often a question was attempted to be answered, as well as how accurate it was when it did attempt the answer. 

Comparing the results below you can see that some digital assistants attempt to answer (swing the bat) more than others. Plus, the accuracy (makes contact with the ball) is also very different too. Google and Ida both hit in the 80’s. Cortana attempts to answer just over 80% of the questions, but only gets it right just over 50% of the time. Meanwhile, Siri swings at the ball 40% of the time, and even then only makes contact 70% of the time. Which is extremely poor. 

Accuracy rates for Ida vs. consumer virtual assistants
Figure 3: Accuracy comparisons

Please note: the scores from Ida are results in production environments across multiple customers and are an accumulated average. Also, Ida is being asked questions that are specific to a client’s organization that also may be specific to individual employees, managers, students and advisors. Therefore, the degree of complexity is much higher.

3.   Batting Cage Averages

There’s a reason that when batting averages for a player are published that they only include stats captured during an actual game. All players look great in the batting cage, as the degree of difficulty is much lower, and, generally, the player knows exactly which kind of ball is being pitched to them. 

Ted Williams was the last player to hit over .400 for a season in 1941. Almost all players hit over .400 in a batting cage. This is why when you see published stats for digital assistant accuracy, it’s important to know that the stats published came from actual usage in a production environment, and weren’t just the result of testing taken place in a QA environment with a bunch of teed up utterances that didn’t truly test the ability to match accurately. 

Just as an FYI, we run over 10,000 test utterances against Ida any time we make any change. And in QA Ida scores over 97% accuracy. Whereas in production, in client environments Ida scores around 85%, and that’s the number we publish. 

Kershaw pitching vs. pitching machine
Figure 4: Hitting against Clayton Kershaw is a lot more difficult than a pitching machine.

4.   The Curve Ball

While it’s obvious why Clayton Kershaw is harder to hit than a pitching machine, it may not be so obvious why humans interacting with a digital assistant is so much more difficult to handle than test utterances in a QA environment. So, let’s do this with some examples:

Ida Dialogue: someone asking about paycheck
Figure 5: Typical training/testing utterance

Now, let’s see an example that a human actually typed:

Ida Dialogue: someone asking about paycheck
Figure 6: Sample of an utterance by a human in a production environment

As you can see, as much as vendors try and emulate utterances in their DEV and QA environments, what actual people say often comes out of left field and can really fool a digital assistant that lacks the technological maturity to filter out the essence of what is being communicated. Complex utterances are really the equivalent of the curve ball in baseball (or splitter, slider, etc.).

5.   Swinging the Bat

So, exactly how does a digital assistant know when to try and answer a question (swinging the bat) and when to claim ignorance (lay off the ball)? For many it’s a simple case of confidence levels. When an utterance is passed to an NLP engine, what gets returned is a list of possible things it thinks it has a match to, plus a confidence level attached to each one. Then, for a basic digital assistant, it’s just a case of selecting the one with the highest ranking and presenting that to the human. Sometimes the digital assistant will also include, as an act of disambiguation, a list of those things that are closely grouped in confidence levels. 

For really advanced digital assistants like Ida that use nested NLP techniques, much more complex algorithms are used to determine what to present and how to present it. But, ultimately, everything is a combination of either single or multiple confidence levels that may or may not be passed into even more complex algorithms to further establish what the human is asking the robot. 

A really good digital assistant doesn’t just have a high accuracy rating, it also has a high rate of responding to questions (swinging the bat). A digital assistant that only replies to really obvious requests will score high on accuracy, but low on satisfaction, and low on its ability to truly solve problems. 

6.   The Pinch Hitter

As we roll into 2021, a subject that will continue to keep coming up at all organizations will be the concept of the digital assistant “concierge”. Given the inevitable plethora of bots being implemented at many organizations, typically on different technology platforms, the question that will be constant in 2021 is:

“Can’t we just have one digital assistant that everyone interacts with. It’s too complicated for our people to know which one to go to?” And the simple answer is yes. 

Organizations have begun to realize that one digital assistant should be the “face” to the organization, and that it is the job of the “face” to handle integration with all the other bots in the organization. Even those on a different technology stack. This is called a “concierge” solution, where the digital assistant can reach out to different bots at runtime to get the answers to questions it knows nothing about. Kind of like how the concierge in a hotel operates. 

Technically, and to keep with our baseball theme, “concierging” entails swapping different bots in and out of the lineup based on the question the digital assistant is faced with. Just like bringing in your lefty to face the right-handed pitcher. So, for example, if the digital assistants key strength is HR based requests and the human asks a finance question, the digital assistant will reach out to the bot, or skill, that can best handle it, and acts as an intermediary communicating the responses with the human. 

Ida’s underlying technology stack is based on Oracle technology, and so it is easy for Ida to concierge with any skill built on the Oracle Digital Assistant technology stack. Plus, because of the advanced nature of the stack and the middleware Ida uses to integrate with the stack, it is also possible to plug other completely different technology-based bots into the solution too. Like Microsoft LUIS or IBM Watson. 

7.   Laying Off the Ball

Of course, no digital assistant should be attempting to answer every question or request that it gets. If someone wants to know how tall Tom Brady is, that’s probably not a question it should be trying to respond to. But what it shouldn’t be doing is just responding with an “I don’t know the answer to that”. In baseball, if the ball is pitched a foot outside the plate, the hitter knows enough to know that they shouldn’t be swinging at the ball. That’s how baseball players, and digital assistants, make fools of themselves. 

The best way for a digital assistant to lay off the ball is for it to politely say, “I don’t know the answer to that. But here are the topics I do know a lot about. You can browse these topics, or I can pass you to a live agent if you need more help”. 

8.   Slugging Percentage

For most of this article we have focused on contact with the ball as being the key metric for judging a digital assistant. Was the human utterance correctly matched to what the digital assistant was capable of responding to? Which is why batting average is used as a direct parallel. In reality there is another dimension to all of this that is best described with the comparison to baseball slugging percentages. 

In baseball not all hits have the same value. One hit may get the player to first base, while another hit may score a home run. And, as everyone knows, hitting a home run is massively valuable, so slugging percentage is used as means to measure the power of a hitter. If you combine a hitter’s ability to get on base, plus the power they generate with each hit, that equates to a stat called OPS. And if the OPS is greater than 1.000 that means you are a superstar. 

Digital assistants are exactly the same. Just being accurate (getting on base) isn’t sufficient. It’s the quality of the response that really counts (slugging percentage). 

So, what exactly do we mean by that? Let’s see an example to illustrate the point:

An example of just getting to first base:

Ida Dialogue: simply link to other website
Figure 7: Barely getting on first base

An example of the home run:

Ida Dialogue: complex transfer
Figure 8: A home run hit out of the park

As you can see, the first example provides a link to web page, where all you can really do is initiate a request for someone to get this done for you. The second example clarifies any outstanding questions, completes the task, and triggers any appropriate workflow. This is called a home run, and it speaks to the deep integration capabilities of the digital assistant. 

As is very clear in the examples above, the first example is not much better than an improved means of navigating crude FAQ web pages. Whereas the second example shows the value a superstar digital assistant can bring to your organization. 

9.   Statistics, statistics, statistics

The world of baseball is measured with every statistic you could ever imagine. And, these days, all organizations used advanced statistics to make pretty much every decision. No human, no matter how good their “eye” is, can process 250,000 data points and make an informed decision without some kind of analytics engine to provide guidance. And this is true in the world of digital assistants. Despite almost every salesperson in the AI world telling their prospect that AI “just learns”, that’s just not true, and nor is it advisable. Once a digital assistant is introduced to your organization, every decision you make in terms of growing its knowledge, and improving its capabilities, needs to be driven from hard data. 

Yes, aspects of AI are a black box, but any black box can be measured, and by precise measurement you can predict behavior, and also alter it when needed. 

At IntraSee we ask Ida over 10,000 questions every time we add more knowledge to the corpus, plus examine a quarter of a million data points. And, because this is impossible to do by hand, we automate the entire process. This allows us to ensure Ida gets smarter each week, and that there is no regression. Without all this data it would be impossible to do that.

The orchestration of supervised training and measurement is the key to continuous healthy growth. Like humans, digital assistants start life being great at some things, and not so great at others. And it’s important for all organizations to understand exactly what those strengths and weaknesses are. And, just like humans, this information can be used to further train the digital assistant, and also further enhance the quality of the responses. 

With an array of statistics that are both high level for your executives, and detailed enough for your business analysts, you will have the tools to effectively manage your digital assistant. 

10.   Player Development

Most people assume that the use of statistics is confined to trading, drafting, and game strategy. But the most advanced organizations also use statistics to drive player development. It’s one thing to draft or trade for a great prospect, but unless they are coached correctly, you’ll never get the best out of them. Sometimes it will be a case of tiny fractions of adjustment to the length or angle of a swing that makes the difference between a good player and a great player. 

Similarly, with digital assistants, they are not a “set and forget” solution. How you develop your digital assistant with continuous learning is what turns a good digital assistant into a great one. Supervised machine learning via automation, and rigorous automated testing, is the key to ensuring that your digital assistant gets smarter every single week. 

In many ways you have to view your “digital worker” in the same light as when you hire a human worker. Performance appraisals, continuous feedback, and the setting of goals are just as important to your digital assistant. And, in some ways, more so. However, unlike your human hire, your digital worker can continue to get smarter and more knowledgeable each week, with no plateau. It will work 24/7 for you, and never call in sick. Plus, it will never leave you for a different organization, taking its knowledge with it. In short, it’s an investment that keeps on paying back. 

If you would like more information, or would like to see a demo, please contact us below.

Contact Us