If 2021 has taught us anything, it is that what sounds like a consensus online is often not. Our world is dominated by algorithms whose output has shown the ability to skew our realities. Bad actors have discovered they can influence algorithms and they do so for financial gain or just a laugh. Artificial Intelligence (AI) can provide great value, but AI with bias and/or inaccuracy is something we must actively guard against. This post is going to explore the traps related to user feedback and how over reliance on that dataset can result in poor outcomes for any AI, but especially for chatbots and digital assistants which are your first line of support for your users.
For the purposes of this post, we will focus our examples on use cases we typically see our customers facing. Users, in this context, are the ones chatting with the bot and looking for support.
What is User Feedback?
User Feedback is a broad term meant to cover both direct and indirect feedback. Direct feedback is when the user is asked for their opinion directly and they reply. You will see this in various forms. For example, the thumbs up and down icons are meant to collect user feedback. You may be asked, “Did this solve your issue?” or “How would you rate this experience?”. Have you seen those buttons at a store’s exit where there is a smiley face, a sad face and something in between? That is a form of direct user feedback.
The other type of feedback is far more subtle and indirect. We can look at a user’s actions and from those infer some level of feedback. These patterns can also be called user cues. An example of such a cue is when the user gets an answer and they respond, “you stink!”. The implication is that the user is unhappy about the previous answer. Another cue can be the circumstances under which a user clicks a help button or even asks to speak to a live agent. All of these indicate something may have gone wrong.
The Feedback Challenge
There is no problem with asking for feedback. In general, it is a good practice. There are some challenges, however, so let’s explore those.
Interpreting User Intent
Interpreting the user’s intended meaning is no easy task. Let’s focus in on a simple interaction to illustrate this point. With many help desk systems, upon completion of the experience, the user will be asked: Did this solve your issue?
Let’s imagine a digital assistant experience…
How much PTO can I borrow?
All our policies can be found in the Policy Center.
The user gives a resounding “NO” to the follow up question, “did this solve your issue?”. The problem is, we don’t really know why it didn’t solve their issue. If we present them with a big, long survey trying to find out why…well, you know no one is spending time on that. Back to the point at hand, there could be all sorts of reasons for the “NO”. For example…
- They are annoyed because the bot didn’t answer directly. It simply gave a link and it is now the user’s problem to find the answer.
- The user may have found the policy on borrowing PTO, but disagreed with the policy itself, thereby not solving the issue at hand.
- The user may be unhappy that they are getting an answer about policies seemingly unrelated to the question which was about how much PTO can be borrowed.
- The user is a bad actor and intentionally provides inaccurate feedback.
Experts say the key to effective user feedback is acting on it. However, the confusion around user intent puts you on a steep slope when trying to act.
The next problem with user feedback is that many studies suggest the data is not representative of the user community.
Anecdotal evidence from across the web suggests a typical response rate for an online survey is much lower than 10%. That means the vast majority of your customers (90%+ ) are not telling you what they think. You might be able to argue that away statistically, but in reality are you happy that so many of your customers don’t have a voice?customerthermometer.com
We know user feedback tends to have a self-selecting effect. That is to say, the people who participate skew the data away from a true representation of the whole community. The most basic example of this is that unhappy people provide more feedback than happy people. This makes it very difficult to act on a dataset which lacks representation.
Famously, Microsoft released a bot using their AI to Twitter in 2016, a time when we didn’t fully understand the world of unintended consequences in AI. Without too much detail, let’s say this experiment did not go well. “The more you chat with Tay, said Microsoft, the smarter it gets.” Have you heard this before?
It is a case where users figured out they could influence the AI and they knowingly did so. We know humans are capable of this manipulation. Despite our speculation as to their intentions, we need to actively guard against it. So how does one know the difference between this manipulation and genuine user feedback? If Facebook and Twitter haven’t been able to tell the difference, we should be cautious in thinking we can.
IntraSee’s Feedback Data
Across our customers, many have deployed quick feedback mechanisms like thumbs or star ratings. This feedback is non-interruptive, and the user is not forced to answer. For this type of asynchronous feedback, we are seeing a 3%-4% response rate.
We will also collect feedback that is more synchronous, which the user can ignore, but it is not readily obvious they can continue without providing feedback. This method is getting about a 40% response rate +/- 7%. Clearly, more feedback is gathered with this method, but it can be annoying. To counter that potential frustration no user is asked too often. There is a delicate balance between getting feedback and being bothersome, but we feel throttling is necessary here even though it reduces the data significance.
For one customer, the asynchronous feedback (thumbs/stars) happens 7.5 times as much. Doing the math, we get almost the same amount (+/- 5%) of feedback data from both models!
Automating AI with User Feedback
We now understand that feedback, while valuable, can produce bad outcomes if you are not careful. It is hard to collect, it is often not representative and interpretation is rife with miscalculations. In the chatbot industry, there is a technique which will take user feedback data and feed it into the AI model, but doesn’t that sound problematic when our confidence of this feedback is on shaky ground? Remember how Microsoft said to just use it and it will get better?
Machine Learning AI is the most powerful type of engine behind enterprise-grade digital assistants. That AI uses a model that is trained with data just like a Tesla uses pictures of stop signs to understand when to stop. When we hear, “just use it and it will get better,” what is really happening is the training data is improving which should yield better outcomes. That is, of course, if the training data is of high quality.
How does training data improve? Two traditional ways: manually by a data scientist or automatically. How do you automatically update training data? You need to draw upon data sources, so why not use user feedback? For example, if a user clicks the thumbs down, we can assume the AI had a bad outcome, right?
It sounds like a good idea, but it can be a trap! As previously discussed, we see this data collected < 4% of interactions. Imagine you have 1,000 questions in your bot and get 10,000 user questions in a month. If every question was asked an equal amount of time, that would be 4 pieces of feedback per question! How many months do you need to wait before the feedback has data significance? This effect is even more pronounced if the question is not a top 20 popular question.
Now consider you wait 6 months to have enough feedback to act on it automatically. What has changed in 6 months? The pandemic has taught us that everything can change! By the time you have enough data, that same data may be stale or, worse, incorrect.
This math all assumes feedback data is good and evenly representative, but as discussed above, we know it is not. Oh my, what a mess! We now have limited data, and it is overrepresented by the unhappy and we are considering automatically amplifying their voice into the AI model?
Time for another practical example.
Do I need another vaccine?
Information about health and wellness can be found by contacting the Wellness Center at 800-555-5555.
This answer isn’t wrong, but there is a better answer which specifically talks about booster shot requirements. The user doesn’t know this answer exists, so logically they click thumbs up or answer “yes” to the question, “did this answer your question?”
If we took this indirect user feedback and automatically fed it into the AI, we would be telling the AI you were right to give this less-than-perfect answer. The system is then automatically reinforcing the wrong outcome. Now amplify this by thousands of interactions and what happens? The AI drowns out the more helpful answer about booster shots. The end result of this slippery slope is continual degradation in the quality of service the user receives.
What’s the Solution?
This is a nuanced problem we spend time thinking about so our customers don’t have to. One solution is to not abandon the human touch. The dirty little secret about Alexa and Siri is that they have thousands of people contributing to the AI by tagging real life interactions. If Apple and Amazon still need the human touch in their AI, then it is probably for good reason.
When teachers teach students, they are curating the experience. Teachers don’t simply ask students, “do you feel you got this test question correct?”. They are grading those tests based on their expertise. Asking students to be the grader is flawed.
While we cannot discuss all our tricks, at IntraSee we will be introducing some new technology in 2022 directly aimed at this challenge. The lesson learned here is that while automating the data that feeds an AI model can be powerful, it is a power that comes with great responsibility. Ask your AI vendors how they solve this challenge. For our customers, these challenges are our problem at IntraSee, not yours. Rest assured, we are all over the challenges so you don’t have to spend a minute on them 😀