If 2021 has taught us anything, it is that what sounds like a consensus online is often not. Our world is dominated by algorithms whose output has shown the ability to skew our realities. Bad actors have discovered they can influence algorithms and they do so for financial gain or just a laugh. Artificial Intelligence (AI) can provide great value, but AI with bias and/or inaccuracy is something we must actively guard against. This post is going to explore the traps related to user feedback and how over reliance on that dataset can result in poor outcomes for any AI, but especially for chatbots and digital assistants which are your first line of support for your users.
For the purposes of this post, we will focus our examples on use cases we typically see our customers facing. Users, in this context, are the ones chatting with the bot and looking for support.
What is User Feedback?
User Feedback is a broad term meant to cover both direct and indirect feedback. Direct feedback is when the user is asked for their opinion directly and they reply. You will see this in various forms. For example, the thumbs up and down icons are meant to collect user feedback. You may be asked, “Did this solve your issue?” or “How would you rate this experience?”. Have you seen those buttons at a store’s exit where there is a smiley face, a sad face and something in between? That is a form of direct user feedback.
The other type of feedback is far more subtle and indirect. We can look at a user’s actions and from those infer some level of feedback. These patterns can also be called user cues. An example of such a cue is when the user gets an answer and they respond, “you stink!”. The implication is that the user is unhappy about the previous answer. Another cue can be the circumstances under which a user clicks a help button or even asks to speak to a live agent. All of these indicate something may have gone wrong.
The Feedback Challenge
There is no problem with asking for feedback. In general, it is a good practice. There are some challenges, however, so let’s explore those.
Interpreting User Intent
Interpreting the user’s intended meaning is no easy task. Let’s focus in on a simple interaction to illustrate this point. With many help desk systems, upon completion of the experience, the user will be asked: Did this solve your issue?
The user gives a resounding “NO” to the follow up question, “did this solve your issue?”. The problem is, we don’t really know why it didn’t solve their issue. If we present them with a big, long survey trying to find out why…well, you know no one is spending time on that. Back to the point at hand, there could be all sorts of reasons for the “NO”. For example…
They are annoyed because the bot didn’t answer directly. It simply gave a link and it is now the user’s problem to find the answer.
The user may have found the policy on borrowing PTO, but disagreed with the policy itself, thereby not solving the issue at hand.
The user may be unhappy that they are getting an answer about policies seemingly unrelated to the question which was about how much PTO can be borrowed.
The user is a bad actor and intentionally provides inaccurate feedback.
Experts say the key to effective user feedback is acting on it. However, the confusion around user intent puts you on a steep slope when trying to act.
Selection
The next problem with user feedback is that many studies suggest the data is not representative of the user community.
Anecdotal evidence from across the web suggests a typical response rate for an online survey is much lower than 10%. That means the vast majority of your customers (90%+ ) are not telling you what they think. You might be able to argue that away statistically, but in reality are you happy that so many of your customers don’t have a voice?
customerthermometer.com
We know user feedback tends to have a self-selecting effect. That is to say, the people who participate skew the data away from a true representation of the whole community. The most basic example of this is that unhappy people provide more feedback than happy people. This makes it very difficult to act on a dataset which lacks representation.
Intentional Manipulation
Famously, Microsoft released a bot using their AI to Twitter in 2016, a time when we didn’t fully understand the world of unintended consequences in AI. Without too much detail, let’s say this experiment did not go well. “The more you chat with Tay, said Microsoft, the smarter it gets.” Have you heard this before?
It is a case where users figured out they could influence the AI and they knowingly did so. We know humans are capable of this manipulation. Despite our speculation as to their intentions, we need to actively guard against it. So how does one know the difference between this manipulation and genuine user feedback? If Facebook and Twitter haven’t been able to tell the difference, we should be cautious in thinking we can.
IntraSee’s Feedback Data
Across our customers, many have deployed quick feedback mechanisms like thumbs or star ratings. This feedback is non-interruptive, and the user is not forced to answer. For this type of asynchronous feedback, we are seeing a 3%-4% response rate.
We will also collect feedback that is more synchronous, which the user can ignore, but it is not readily obvious they can continue without providing feedback. This method is getting about a 40% response rate +/- 7%. Clearly, more feedback is gathered with this method, but it can be annoying. To counter that potential frustration no user is asked too often. There is a delicate balance between getting feedback and being bothersome, but we feel throttling is necessary here even though it reduces the data significance.
For one customer, the asynchronous feedback (thumbs/stars) happens 7.5 times as much. Doing the math, we get almost the same amount (+/- 5%) of feedback data from both models!
Automating AI with User Feedback
We now understand that feedback, while valuable, can produce bad outcomes if you are not careful. It is hard to collect, it is often not representative and interpretation is rife with miscalculations. In the chatbot industry, there is a technique which will take user feedback data and feed it into the AI model, but doesn’t that sound problematic when our confidence of this feedback is on shaky ground? Remember how Microsoft said to just use it and it will get better?
Machine Learning AI is the most powerful type of engine behind enterprise-grade digital assistants. That AI uses a model that is trained with data just like a Tesla uses pictures of stop signs to understand when to stop. When we hear, “just use it and it will get better,” what is really happening is the training data is improving which should yield better outcomes. That is, of course, if the training data is of high quality.
How does training data improve? Two traditional ways: manually by a data scientist or automatically. How do you automatically update training data? You need to draw upon data sources, so why not use user feedback? For example, if a user clicks the thumbs down, we can assume the AI had a bad outcome, right?
It sounds like a good idea, but it can be a trap! As previously discussed, we see this data collected < 4% of interactions. Imagine you have 1,000 questions in your bot and get 10,000 user questions in a month. If every question was asked an equal amount of time, that would be 4 pieces of feedback per question! How many months do you need to wait before the feedback has data significance? This effect is even more pronounced if the question is not a top 20 popular question.
Now consider you wait 6 months to have enough feedback to act on it automatically. What has changed in 6 months? The pandemic has taught us that everything can change! By the time you have enough data, that same data may be stale or, worse, incorrect.
This math all assumes feedback data is good and evenly representative, but as discussed above, we know it is not. Oh my, what a mess! We now have limited data, and it is overrepresented by the unhappy and we are considering automatically amplifying their voice into the AI model?
Time for another practical example.
Do I need another vaccine?
Information about health and wellness can be found by contacting the Wellness Center at 800-555-5555.
This answer isn’t wrong, but there is a better answer which specifically talks about booster shot requirements. The user doesn’t know this answer exists, so logically they click thumbs up or answer “yes” to the question, “did this answer your question?”
If we took this indirect user feedback and automatically fed it into the AI, we would be telling the AI you were right to give this less-than-perfect answer. The system is then automatically reinforcing the wrong outcome. Now amplify this by thousands of interactions and what happens? The AI drowns out the more helpful answer about booster shots. The end result of this slippery slope is continual degradation in the quality of service the user receives.
What’s the Solution?
This is a nuanced problem we spend time thinking about so our customers don’t have to. One solution is to not abandon the human touch. The dirty little secret about Alexa and Siri is that they have thousands of people contributing to the AI by tagging real life interactions. If Apple and Amazon still need the human touch in their AI, then it is probably for good reason.
When teachers teach students, they are curating the experience. Teachers don’t simply ask students, “do you feel you got this test question correct?”. They are grading those tests based on their expertise. Asking students to be the grader is flawed.
While we cannot discuss all our tricks, at IntraSee we will be introducing some new technology in 2022 directly aimed at this challenge. The lesson learned here is that while automating the data that feeds an AI model can be powerful, it is a power that comes with great responsibility. Ask your AI vendors how they solve this challenge. For our customers, these challenges are our problem at IntraSee, not yours. Rest assured, we are all over the challenges so you don’t have to spend a minute on them 😀
The next release of Ida, our digital assistant, will be available this January, 2022. Clients can talk to their account teams about a deployment schedule that works for you.
21.04 in Summary
21.04 is the biggest release of Ida in over a year. Our experience over the last year has validated our opinion that having one-stop shopping for all your answers is the key to successful ROI. Running AI like Ida at scale means getting engagement from multiple departments who need to contribute their organizational-specific expertise to the solution. 21.04 is all about creating a powerful, federated AI platform to achieve scale and maximize usefulness to users while taking your return on investment to new highs.
With 21.04, our customers can run multiple unique versions of Ida while leveraging knowledge and AI models that are common to all your organizations, as we call them. An organization is any entity such as a department, a campus, a business unit, a location, etc. This all happens in a single tenancy so the value propositions really accelerate with Ida.
The second pillar of this release is the ability to understand accuracy and performance over time in a fully automated cycle. No human intervention is required to understand how the machine is doing. One of the key mission’s of Ida is transparency around performance of the AI despite it not being a common approach in our industry.
Finally, a whole host of new organization aware reports are now available as well as a refreshed Ida dashboard making managing the solution even easier.
Release Notes
Ability to get to help from Disambiguation flow
Accuracy reporting without rating data
Added tracking of live NLP data
FBL Metrics Report improvements for sampling
Fixed jumping scroll on Feedback Loop pages
Greater language support for system messages
Improvements to AI automated testing
Improvements to Dashboard UX
Improvements to dynamic Hello response
Improvements to train/test labels on reports and pages
Organization support by Chat location
Report: Answers by Channel
Report: Confidence Bands
Report: Daily Conversation Summary & Feedback
Report: Location of chats based on live data
Report: Low Confidence Utterances
Report: Monthly Active Users
Report: Monthly Answers (month over month)
Report: Topics that led to frustration
Report: Updated Skeleton Report
Streamlined FAQ lookup page
Support for decentralized Ida setup
Updated Environments tile with link to release notes
UX improvement to non-prod vs. prod data and tasks
Various bug fixes to thumb rating collection and processing
Video Training Guides
Contact us below to learn more and setup your own personal demo
With the release of PeopleSoft Picaso, our clients are asking a lot of questions. It is not surprising because in the world of AI there is a lot of murkiness caused by sales and marketing messages. This post will bring some clarity so you can be well informed.
Many years ago our digital assistant, Ida, shifted to using Oracle’s ODA PaaS service for its natural language processing (NLP) due to its flexibility and power. ODA is often unheralded, but its NLP uses 40 years’ worth of enterprise data from “The” data company, Oracle. It is a unique advantage not held by competing products.
ODA is a stand-alone platform running in the cloud and has no requirement to be used with Oracle Applications. An entirely different team at Oracle runs the ODA division. As with all technology products, and just like Oracle’s database, Oracle’s applications can provide value by leveraging the company’s technology. HCM Cloud, ERP Cloud and now PeopleSoft products are delivering functionality, called “skills”, for ODA customers.
PeopleSoft’s skills are deployed through the brand “Picaso” and this post is all about Picaso, what it is, who should use it and the value it provides.
What is Picaso?
Picaso is the name given to the Chat UI widget (the actual window you chat in within PeopleSoft) which consumes multiple PeopleSoft delivered skills (from HCM and FSCM currently). Just like Workday’s or Salesforce’s chatbots, they are meant to extend the application’s functionality. PeopleSoft Picaso, however, is built on a far more powerful platform than the aforementioned products.
The purpose of Picaso is to provide a conversational user interface (UI) to PeopleSoft which is represented in accessing PeopleSoft data and even conducting a few self-service transactions like registering a PTO day.
What do I get?
Currently, Picaso comes with a web-based channel inside Fluid pages. Classic is not supported. When a PeopleSoft user logs in, they will see a chat icon they can engage with. PeopleSoft then delivers skills via the ODA Skill Store. The current skills as of the Fall of 2021 are:
Absence Skill (HCM)
Benefits Skill (HCM)
Payroll for North America Skill (HCM)
Requisition Inquiry Skill (FSCM)
Expense Inquiry Skill (FSCM)
Employee Directory Skill (HCM)
More information about these skills can be found in the Picaso documentation. These skills are built to access data from these modules and, in some cases, process transactions as pictured below. There are minimum PeopleTools version and PUM image requirements, so be sure to check those out.
PeopleSoft PICASO HCM Absence Skill
How much does it cost?
In the PeopleSoft world, you pay for the application (HCM) and you get the technology included (PeopleTools). Digital Assistants work in the opposite manner. Picaso and its skills are free for PeopleSoft customers, but you need to purchase the PaaS cloud service for ODA. Implementing Picaso may come with additional consulting fees should you need assistance.
How do I implement Picaso?
Implementing PICASO involves a few key steps:
License ODA
Setup your OCI tenancy, ODA instance and provision access
Install the PeopleSoft Skills and setup a channel
Setup connectivity to your PeopleSoft (must be accessible on the internet)
Configure PeopleSoft to allow integration of ODA
Create a Digital Assistant, add the Picaso skills, and setup the channel integration
Train and test the bot
Deploy
In our experience, step 4 tends to be the biggest stumbling block for customers and requires a Cloud Architect to fully understand. Additionally, to get the most out of Picaso, having AI people on hand is invaluable. If you don’t have these roles available, you can enlist an Oracle Partner to help such as IntraSee.
What is the difference between Ida and Picaso?
Customers are coming to us and asking: should I implement Ida or Picaso? This notion is really a false comparison. Each product has a purpose and goes after different value propositions. They are more compliments to each other than they are an either/or choice.
Let’s start with what is not different between Ida and Picaso. Both use the ODA NLP engine as a basis for the machine’s understanding of human language. They are seeded with very different training data, but the engine is the same. Both require licensing of the ODA PaaS service (Ida embeds this license within its pricing).
Picaso was built with a focus on PeopleSoft. It is present in the Fluid Web UI and helps PeopleSoft users get to data, pages and transactions. It is a great step into digital assistants if your needs are focused on PeopleSoft. With Picaso you will need to handle the AI Ops such as log/analytics monitoring, migrations, uptime, learning, etc. You do have the option of using a partner to manage these services and IntraSee is one such option.
Ida is meant to be a one-stop shop for users including policies, content, data, workflow, transactions, integrations and analytics. PeopleSoft is merely a sliver of Ida’s value proposition. Ida has customers whose use is approaching 1,000 skills so the audience is broader with the scale to match. Ida is often found on many web pages outside PeopleSoft, including SharePoint or CMS systems, and channels like Microsoft Teams. Ida is also available when the user is both authenticated or not.
Ida’s integrations are a key part of the one-stop philosophy. Ida has a catalog of integrations including Salesforce, ServiceNow, HCM Cloud, PeopleSoft, Google, Microsoft Teams and Office 365, Canvas, Taleo, Kronos and others including the ability to configure custom integrations.
Finally, AI Ops is a critical part to your project’s success. AI Ops teams are often made up of Data Scientists, AI Architects and Computational Linguists. Despite what some marketing teams may tell you, AI isn’t magic and it needs human cultivation for it to achieve superior accuracy performance. With Ida, we automate many of these human tasks and include managed services to fill these roles, so you don’t have to (illustrated below). The budget saved on these salaries or consulting fees alone makes an Ida project an ROI winner.
An illustration of Ida’s Value Added features as of 21.03
Can Picaso and Ida work together?
Because both Picaso and Ida run on ODA, Ida can consume skills from Picaso and include them in a one-stop-shop chat. You can get the best of both options and they can leverage the same ODA license, so the cost is only incremental.
The most efficient path is to roll out Picaso and Ida skills at the same time. This path allows you to tune the machine learning model with the broader scope in mind from the get-go. The alternative requires regression analysis and re-tuning you could have avoided. That doesn’t mean you can’t have a gap between implementing Picaso and Ida, but it is not the most cost/time efficient path.
Which one is the right choice for me?
If your objective is to start with a focused use-case, get some experience and add functionality to a PeopleSoft Fluid deployment, Picaso can be a great fit for you. If your mission is to drive ROI through automation at an enterprise level, then check out Ida. For most clients, we implement in 6 weeks and are in production after 12 weeks.
The next release of Ida, our digital assistant, will be available for customers beginning at the end of September, 2021. This post contains the highlights of this release which focused on automation and improving the ROI of digital assistants.
21.03 in Summary
21.03 includes many fixes and enhancements with a focus on the Feedback Loop process and the new Oracle Digital Assistant (ODA) NLP module. We are making the process of giving the machine feedback easier, more streamlined and efficient so it takes you less time each cycle. Ida’s Feedback Loop is a differentiating feature which drives its high accuracy marks.
Ida also adds improved automation to accuracy tracking and will now warn you when accuracy starts to slip or there is a growing concern. This will save you time from having to analyze reports and instead just grab your attention when your attention is needed.
Improved user experience features are also part of this release such as the new ability to rate, trap and provide custom outcomes when users are expressing frustration. An example of this is getting them over to a live agent when you sense they are growing frustrated.
The Ida library was also updated and we now have hundreds of pre-built skills for employees, managers, students, faculty, advisors and guests. You can read more detailed release notes below.
Release Notes
Bulk Import/Export Utility for FAQ+
Streamlined CV Configuration Page
Streamlined bot build process
Updated Ida catalog & training data
Autotest support for negative testing
Improved Feedback Loop labeling/help
Ability to record who initiated a chat via handoff
Improvements to Lightbox UX in Feedback Loop pages
Adjust bot export process to allow “empty” NLP Segments
Automated topic-level training data management
Improved QA autotest output
Last Accuracy KPI Dashboard Tile
Accuracy Leak KPI Dashboard Tile
Streamlined UI and choices for Feedback Ratings
Feedback Loop: New unrated option for ‘too vague to rate’
Trap frustration utterances and direct accordingly
Fixed show utterances in FBL in some use cases
Improved thumbs rating UI
Metrics Report: Sort location data by conversation count
Add DE branching logic support for auto-suggest inputs
Friendly not authorized message when guest-to-auth handoff fails
MS Teams Authentication updates
Improved translation performance for HTML responses
Improved filtering and defaults for Feedback Loop
Allow small talk decoupling in a digital assistant setting
Support for custom disambiguation response pre-text
Updated Feedback Loop Metrics Report to accommodate data model changes
Scheduled archiving of old Feedback Loop and Autotest data
jQuery conflict resolution for chat ui
Contact us below to learn more and setup your own personal demo:
Ida 21.02 will be released and available for customers beginning at the end of this month. Here are the highlights in this release as we continue to fine-tune the incredible accuracy performance we see from Ida as well as make various bug fixes and improvements.
No-match text is now configurable
Feedback Loop language toggle button styling changes (see both English and native languages used in this tool)
Capturing auto-utterance & initiator analytics to better understand who is handing off to Ida
Add Oracle ODA 20.09+ NLP model support for improved accuracy
Manual chat-ui language setting to force a specific language vs. auto-detection
Pruning features for audit data
Pruning & archive features for chat data
Check skill version against IUC version prior to testing
FBL Row Padding Fixes
Clean up Thumbs UI/CSS/HTML
Support separate DE processes for Help FAQ and NLP Failure process
Easy On/Off for Thumbs user satisfaction ratings
Audit reports for blank lines in answers
New 80/20 split administration page for training/testing data sets
Feedback loop now filters at server for improved filtering
Fixed MS Teams Variable Error
Now Capturing MS Bot User ID values
Product Update Notes
The focus for this release was to continue to improve NLP and utterance matching performance even beyond the 90% mark most of our clients are seeing. The central part of this improvement is supporting updated NLP models. As part of this support, the automated regression testing was significantly changed to more closely model real life thereby ensuring better quality assurance.
A series of features were added to understand how Ida plays in the larger context of an enterprise by tracking any referrals it gets, where it is being used and what channels it is running on (such as Microsoft Teams).
Next we have features added for better multi-language support such as now having a choice between a configured language for a user vs. auto-detection. Additionally, the language being used can now be passed to integrated systems for an end-to-end experience in your language.
We continue to add more skills to the library of Ida. Clients can get up and running quickly by using this catalog. Recently we have been adding content around the return to work/campus. Finally, many bug fixes, performance improvements and other minor updates are included.
Contact us below to learn more and setup your own personal demo: