The Voxco Answers Anything Blog
Read on for more in-depth content on the topics that matter and shape the world of research.
Inspire. Learn. Create.
Text Analytics & AI
AI & Open End Analysis
How to Choose the Right Solution
The Latest in Market Research
Market Research 101
Text Analytics & AI
Authentication with Ascribe APIs
The various Ascribe™ APIs, such as the Ascribe Coder API, share a common authentication operation. You authenticate with the API by a POST to the /Sessions resource. To interact with any of the other resources in the APIs you must include a token returned in the POST response in an HTTP header.
Obtaining an Authentication Token
To obtain an authentication token, POST to the /Sessions resource. The body of the POST request contains the login credentials for an Ascribe user: Account, Username, and Password. The body of the POST request has this form:{
"account": "MyAscribeAccount",
"userName": "apiuser",
"password": "--------"
}Here we are authenticating as with the user name apiuser
, on the MyAscribeAccount
. We recommend that you create an Associate in Ascribe exclusively for the API. This is because the time accounting features of Ascribe will attribute activity initiated via the API to this user. This makes interpretation of the time accounting reports easier.The POST response has this form:{
"authenticationToken": "0e2338c4-adb4-474a-a26a-b9dbe1b32ac8",
"bearerToken": "bearer eyJhbGciOiJSUzI1NiIsInR5cC ... WiiVzXPhdgSWNvH1hNOIirNmg"
}The value of the bearerToken
property is actually much longer than shown here. You can use either of these tokens (but not both) in subsequent operations with the API.
Differences between the Token Types
The authenticationToken
returned by the /Sessions resource expires after 30 minutes of inactivity with the API. In other words, you must perform an operation with the API at least once every 30 minutes. After that you must obtain another token from the /Sessions resource. The token will remain valid indefinitely if you do make a request at least once every 30 minutes.The bearerToken
does not expire. Once you have obtained a bearerToken
it will remain valid until the login credentials of the user are changed.
Using Authentication Tokens
To use the authenticationToken
returned by the /Sessions resource, include it in an HTTP header. The key of the header is authentication
, and the value is the authenticationToken
. The raw HTTP request looks like this:GET /coder/studies
accept: application/json, text/json, application/xml, text/xml
authentication: 207e5be1-c3fa-4ff6-bbc7-475a367ec095
cache-control: no-cache
host: webservices.languagelogic.net
accept-encoding: gzip, deflateTo use the bearerToken
, use an HTTP header whose key is “authorization” and value the bearerToken
. The raw HTTP request looks like this:GET /coder/Studies
accept: application/json, text/json, application/xml, text/xml
authorization: Bearer eyJhbGciOiJSUzI1NiI ... lJqc7nPxcFdU3s90BkOPlPF6wTxkzA
cache-control: no-cache
host: webservices.languagelogic.net
accept-encoding: gzip, deflate
Summary
The Ascribe APIs require the remote client to authenticate. The remote client authenticates by providing user credentials, and uses one of the two tokens returned in subsequent operations with the API.See the Ascribe Coder API documentation, and the Ascribe CX Inspector API documentation for more information.
8/21/18
Read more
Text Analytics & AI
Multilingual Customer Experience Text Analytics
If your company has customers in more than one country you are likely faced with analysis of customer feedback in multiple languages. CX Inspector can perform text analytics in several languages and can translate customer responses in any language to one of the supported analysis languages.
Supported Analysis Languages
CX Inspector can perform text analytics in these languages:
- Arabic
- Chinese (both Simplified and Traditional)
- English
- French
- German
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
The language that you select for the analysis language determines the language you will see when looking at the Inspection. If the comments you are loading are not in the language that you select for the analysis, then the comments will be translated to that language.
Loading Multilingual Comments
CX Inspector detects the language in each question when you load new data. For example, if I load Russian comments CX Inspector will present these options for the analysis language:
CX Inspector has detected that the comments are Russian and will not translate the responses because the selected analysis language is also Russian. I can inform CX Inspector that it has incorrectly determined that all the comments are Russian by checking the box force translation. When I load the comments with Russian selected as the analysis language the Inspection results will be presented in Russian.
If I instead select German as the analysis language:
CX Inspector will now translate the comments to German, and then perform text analytics in German. The Inspection results will be presented in German.
During translation CX Inspector performs language detection on each comment individually, so it is fine to have a mixture of different languages in the comments.
Multilingual Analysis Scenarios
Let’s look at three scenarios:
- Jean is a researcher in France, with data from a survey conducted in French only. His client is a French company.
- Albert is a researcher in the U.K. His client is a London bank that is considering a German presence. He has conducted a survey in Germany for that client.
- Samantha is a researcher in the U.S. Her client is a global CPG company based in the Midwest. She has conducted a survey in 12 countries for that client.
What is the best approach for each of these researchers using Inspector?
Jean speaks French, his client is French, and he has French data. Jean clearly should select French as the language when running an analysis. He does not need the translation features of CX Inspector.
Albert is in a different position. He could perform an analysis in German. But what use is that? If Albert speaks German, he might be able to interpret the results. But what about Albert’s client in the U.K.? The client probably wants to see the analysis, and the comments, in English. Albert is probably better off translating the responses to English, then running the analyses in English. He can automatically translate the responses, then do all his work in English. Albert can now present the results to his client in English.
Samantha has a more complex problem. Like Albert, she will likely want to perform analyses in English. But she has 12 languages to deal with. Samantha can simply select English as the analysis language. The language of each comment will be detected at translation time; all the comments will be translated to English, analyzed in English, and presented in English.
How many languages are supported?
CX Inspector can perform text analytics and present its results in the languages listed at the top of this article. But CX Inspector can work with comments in any language, provided only that the comments are automatically translated to one of the supported analysis languages.
8/20/18
Read more
Text Analytics & AI
Verbatim Coding for Open Ended Market Research
Coding Open-Ended Questions
Verbatim coding is used in market research to classify open-end responses for quantitative analysis. Often, verbatims are coded manually through software such as Excel, however, there are verbatim coding solutions and coding services available to streamline this process and easily categorize verbatim responses.
Survey Research is an important branch of Market Research. Survey research poses questions to a constituency to gain insight into their thoughts and preferences through their responses. Researchers use surveys and the data for many purposes: customer satisfaction, employee satisfaction, purchasing propensity, drug efficacy, and many more.
In market research, you will encounter terms and concepts in data specific to the industry. We will share some of those with you here, and the MRA Marketing Research Glossary can help you understand any unknown terms you encounter later.
Seeking Answers by Asking Questions
Every company in the world has the same goal: they want to increase their sales and make a good profit. For most companies, this means they need to make their customers happier — both the customers they have and the customers they want to have.
Companies work toward this goal in many ways, but for our purposes, the most important way is to ask questions and plan action based on the responses and data gathered. By “ask questions,” we mean asking a customer or potential customer about what they care about and taking action based on the customer feedback.
One way to go about this is to simply ask your customers (or potential customers) to answer open ended questions and gather the responses:
Q: What do you think about the new package for our laundry detergent?
A: It is too slippery to open when my hands are wet.
This technique is the basis of survey research. A company can conduct a questionnaire to get responses by asking open ended questions.
In other cases, there may be an implied question and response. For example, a company may have a help desk for their product. When a customer calls the help desk there is an implied question:
Q: What problem are you having with our offering?
The answers or responses to this implied question can be as valuable (or more!) as answers and responses to survey questions.
Thinking more broadly, the “customer” does not necessarily have to be the person who buys the company’s product or service. For example, if you are the manager of the Human Resources department, your “customers” are the employees of the company. Still, the goal is the same: based on the feedback or response from employees, you want to act to improve their satisfaction.
Open, Closed, and Other Specify
There are two basic types of data to gather responses in survey research: open and closed. We also call these open-end and closed-end questions.
A closed-end question is one where the set of possible responses is known in advance. These are typically presented to the survey respondent, who chooses among them. For example:
Open-end questions ask for an “in your own words” response:
The response to this question will be whatever text the user types in her response.
We can also create a hybrid type of question that has a fixed set of possible responses, but lets the user make an answer or response that was not in the list:
We call these Other Specify questions (O/S for short). If the user types a response to an O/S question it is typically short, often one or two words.
Just as we apply the terms Open, Closed, and O/S to questions, we can apply these terms to the answers or responses. So, we can say Male is a closed response, and The barista was rude is an open response.
What is an Answer vs a Comment?
If you are conducting a survey, the meaning of the term answer is clear. It is the response given by the respondent to the question posed. But as we have said, we can also get “answers” to implied questions, such as responses to what a customer tells the help desk. For this reason, we will use the more generic term comment to refer to some text or responses that we want to make an examination for actionable insight.
In most cases, comments are electronic text, but they can also be images (handwriting) and voice recording responses.
You need to be aware of some terminology that varies by industry. In the marketing research industry, a response to a question is called either a response or a verbatim. So, when reading data in survey research we can call these responses, verbatims, or comments interchangeably. They are responses to open-end questions. As we will see later, we don’t call the responses to an open-end question answers. We will find that these verbatims are effectively turned into answers by the process of verbatim coding.
Outside of survey research, the term verbatim is rarely used. Here the term comment is much more prevalent. In survey research the word verbatim is used as a noun, meaning the actual text given in response to a question.
Survey Data Collection
In the survey research world, verbatims are collected by fielding the survey. Fielding a survey means putting it in front of a set of respondents and asking them to read it and fill it out.
Surveys can be fielded in all sorts of ways. Here are some of the different categories of surveys marketing research companies might be using:
- Paper surveys
- Mailed to respondents
- Distributed in a retail store
- Given to a customer in a service department
- In-person interviews
- In kiosks in shopping malls
- Political exit polling
- Door-to-door polling
- Telephone interviews
- Outbound calling to households
- Quality review questions after making an airline reservation
- Survey by voice robot with either keypad or voice responses
- Mobile device surveys
- Using an app that pays rewards for completed surveys
- In-store surveys during the shopping experience
- Asking shoppers to photograph their favorite items in a store
- Web surveys
- Completed by respondents directed to the survey while visiting a site
- Completed by customers directed to the survey on the sales receipt
There are many more categories of survey responses. The number of ways to field surveys the ingenious market research industry has come up with is almost endless.
As you can see, the form of the data collected can vary considerably. It might be:
- Handwriting on paper
- Electronic text
- Voice recording responses
- Electronic data like telephone keyboard button presses
- Photographs or other images
- Video recording responses
And so on. In the end, all surveys require:
- A willing respondent
- A way of capturing the responses
The way of capturing the responses is easy. The first takes us to the area of sample we will consider soon.
Looping and Branch Logic
Data collection tools can be very sophisticated. Many data collection tools have logic built in to change the way that the survey is presented to the respondent based on the data or responses given.
Suppose for example you want to get the political opinions of Republican voters. The first question might make the respondent provide his political party affiliation. If he responds with an answer other than “Republican,” the survey ends. The survey has been terminated for the respondent, or the respondent is termed. This is a simple example of branch logic. A more sophisticated example would be to direct the respondent to question Q11 if she answers A, or to question Q32 if she answers B.
Another common bit of data collection logic is looping. Suppose we make our respondents participate in an evaluation of five household cleaning products. We might have four questions we want to ask the respondents about each product, the same four for each product. We can set up a loop in our data collection tool. It loops through the same four questions five times, once for each product.
There are many more logic features of data collection tools, such as randomization of the ordering of questions and responses to remove possible bias for the first question or answer presented.
The Survey Sample
A sample can be described simply as a set of willing respondents. There is a sizable industry around providing samples to survey researchers. These sample providers organize collections of willing respondents and provide access to these respondents to survey researchers for a fee.
A panel is a set of willing respondents selected by some criteria. We might have a panel of homeowners, a panel of airline travelers, or a panel of hematologists. Panelists almost always receive a reward for completing a survey. Often this is money, which may range from cents to hundreds of dollars, however, it can be another incentive, such as coupons or vouchers for consumer goods, credits for video purchases, or anything else that would attract the desired panelists. This reward is a major component of the cost per complete of a survey: the cost to get a completed survey.
Sample providers spend a lot of time maintaining their panels. The survey researcher wants assurance that the sample she purchases is truly representative of the market segment she is researching. Sample providers build their reputation on the quality of sample they provide. They use statistical tools, trial surveys, and other techniques to measure and document the sample quality.
Trackers and Waves
Many surveys are fielded only once, a one-off survey. Some surveys are fielded repeatedly. These are commonly used to examine the change in the attitude of the respondents over time. Researching the change in attitude over time is called longitudinal analysis. A survey that is fielded repeatedly is called a tracker. A tracker might be fielded monthly, quarterly, yearly, or at other intervals. The intervals are normally evenly spaced in time. Each fielding of a tracker is called a wave.
Verbatim Coding
In the survey research industry responses to open-end questions are called verbatims. In a closed-end question the set of possible responses from the respondent is known in advance. With an open-end question, the respondent can say anything. For example, suppose a company that sells laundry detergent has designed a new bottle for their product. The company sends a sample to 5,000 households and conducts a survey after the consumers have tried the product. The survey will probably have some closed-end responses to get a profile of the consumer, but to get an honest assessment of what the consumer thinks of the new package the survey might have an open-end question:
What do you dislike about the new package?
So, what does the survey researcher do with the responses to this question? Well, she could just read each verbatim. While that could provide a general understanding of the consumers’ attitudes, it’s really not what the company that is testing the package wants. The researcher would like to provide more specific and actionable advice to the company. Things like:
22% of women over 60 thought the screw cap was too slippery.
8% of respondents said the bottle was too wide for their shelves.
This is where verbatim coding, or simply coding, comes in. Codes are answers, just like for closed-end questions. The difference is that the codes are typically created after the survey is conducted and responses are gathered. Coders are people trained in the art of verbatim coding and often on a coding platform, such as Ascribe Coder. Coders read the verbatims collected in the survey and invent a set of codes that capture the key points in the verbatims. The set of codes is called a codebook or code frame. For our question, the codebook might contain these codes:
- Screw cap too slippery
- Bottle too wide
- Not sufficiently child-proof
- Tends to drip after pouring
The coders read each verbatim and assign one or more codes to it. Once completed, the researcher can now easily read each one of the coded responses and see what percentage of respondents thought the cap was too slippery. You can see that armed with information from the closed-end responses the researcher could then make the statement:
22% of women over 60 thought the screw cap was too slippery.
Now you can see why the responses to open-end questions are called verbatims, not answers. The answers are the codes, and the coding process turns verbatims into answers. Put another way, coding turns qualitative information into quantitative information.
Codebooks, Codes, and Nets
Let’s look at a real codebook. The question posed to the respondent is:
In addition to the varieties already offered by this product, are there any other old-time Snapple favorites that you would want to see included as new varieties of this product?
And here is the codebook:
- VARIETY OF FLAVORS
- like apple variety
- like peach variety
- like cherry variety
- like peach tea variety (unspecified)
- like peach iced tea variety
- like raspberry tea variety
- like lemon iced tea variety
- other variety of flavors comments
- HEALTH/ NUTRITION
- good for dieting/ weight management
- natural/ not contain artificial ingredients
- sugar free
- other health/ nutrition comments
- MISCELLANEOUS
- other miscellaneous comments
- NOTHING
- DON’T KNOW
Notice that the codebook is not a simple list. It is indented and categorized by topics, called nets, and the other items are codes. Nets are used to organize the codebook. Here the codebook has two major categories, one for people whose responses are that they like specific flavors and the other for people mentioning health or nutrition.
In this example, there is only one level of nets, but nets can be nested in other nets. You can think of it like a document in outline form, where the nets are the headers of the various sections.
Nets cannot be used to code responses. They are not themselves answers or responses to questions and instead are used to organize the answers (codes).
Downstream Data Processing
Once the questions in a study are coded they are ready to be used by the downstream data processing department in the survey research company. This department may be called data processing, tabulation, or simply tab. In tab, the results of the survey are prepared for review by the market researcher and then to the end client.
The tab department uses software tools to analyze and organize the results of the study. These tools include statistical analysis which can be very sophisticated. Normally, this software is not interested in the text of the code. For example, if a response is coded “like apple variety” the tab software is not interested in that text but wants a number like 002. From the tab software point of view, the respondent said 002, not “like apple variety”. The text “like apple variety” is used by the tab software only when it is printing a report for a human to read. At that time, it will replace 002 with “like apple variety” to make it human-readable. Before the data are sent to the tab department each code must be given a number. The codebook then looks like this:
- 001 VARIETY OF FLAVORS
- 002 like apple variety
- 003 like peach variety
- 004 like cherry variety
- 021 like peach tea variety (unspecified)
- 022 like peach iced tea variety
- 023 like raspberry tea variety
- 024 like lemon iced tea variety
- 025 other variety of flavors comments
- 026 HEALTH/ NUTRITION
- 027 good for dieting/ weight management
- 028 natural/ not contain artificial ingredients
- 029 sugar free
- 030 other health/ nutrition comments
- 031 MISCELLANEOUS
- 032 other miscellaneous comments
- 998 NOTHING
- 999 DON’T KNOW
The tab department may impose some rules on how codes are numbered. In this example the code 999 always means “don’t know”.
Choose Ascribe For Your Verbatim Coding Software Needs
When it comes to verbatim coding for open-ended questions in market research surveys, Ascribe offers unparalleled solutions to meet your needs. Our sophisticated coding platform, Ascribe Coder, is designed to streamline the process of categorizing and analyzing open-ended responses, transforming qualitative data into quantitative results. Whether you are dealing with responses from customer satisfaction surveys, employee feedback, or product evaluations, Ascribe provides the tools necessary for efficient and accurate verbatim coding.
If you are short on time or need further assistance with your verbatim coding projects, Ascribe Services can complete the coding project for you. They also offer additional services like Verbatim Quaity review, AI Coding with human review, and translation with human review. Many of the top market research firms and corporations trust Ascribe for their verbatim coding needs. Contact us to learn more about coding with Coder.
8/15/18
Read more
Text Analytics & AI
Text Analytics for CX Management
You hear the term text analytics a lot these days. It seems to be used in so many different ways it is hard to pin down just what it means. I’ll try to clear it up a bit.
The basic idea is that a computer looks at some text to accomplish some goal or provide some service. Just what those goals and services might be we will get to in a moment. First note that text analytics works with text and analyzes it in some way. That’s the analytics part. When I say text, I mean electronic text, like this blog page or some comments from your customers stored in a spreadsheet. Text analytics does not work with voice recordings, videos, or pictures.
Because text analytics examines text written in a natural language (like English or French), it uses techniques from the sciences of Natural Language Processing (NLP) and Linguistic Analysis. The computer techniques used include machine learning, word databases (like a dictionary in a database), taxonomies, part of speech tagging, sentence parsing, and so on. You don’t need to understand these techniques to understand what text analytics does, or how to use it.
Just what can we use text analytics for? Some of the more common uses are:
- Document classification and indexing – Given a bunch of documents, the computer can figure out the key concepts and let you search a library of documents by these. A sophisticated example of this is E-Discovery used in legal practice, which seeks to use the computer to assist in discovery in legal proceedings.
- National security – Governments use computers to monitor web postings for information or discussions of interest to national security.
- Ad serving – We are all experienced with the uncanny ability of some web sites to show us ads that are relevant to our needs and interests. Text analytics applied to the pages we are viewing is a big part of this magic.
- Business intelligence – For most of us this is the big one! We can the computer to give us insights into how to improve our service, retain customers, and give us a competitive advantage.
Text analytics for business intelligence is a rapidly growing market. Using the right tool, you can analyze thousands of customer comments in minutes. If the tool does a good job of presenting the results it is amazing how quickly you can figure out what they care about, their pain points and plaudits, your weaknesses and strengths.
Choosing a Text Analytics Tool
How do you find the right software product for your needs? Well, there are many providers of raw linguistic analysis capabilities. Google, Microsoft, Amazon, SAP, IBM, and many others provide such services via an API. But this takes an engineering effort on your part, and you still need to figure out how to navigate the results of the analysis.
There are several vendors of complete text analytics packages for customer experience management. As you evaluate these consider:
- Does the vendor specialize in customer experience feedback?
- Are the results of the analysis clear and insightful?
- Is text analytics a core competency or a side product?
- Are multiple languages supported?
- Is automatic translation between languages supported?
- How easy is it to tailor the text analytics to your specific business?
Navigating the Linguistic Analysis Results
Suppose you have a database of feedback from your customers. If you run 10,000 customer comments through a linguistic analysis engine it will produce a mountain of data. To gain insight from this data it needs to be organized and navigable. A good text analytics tool will organize the results and display them in a manner that helps you to find the actionable insights in the comments. Optimally the tool will help you pinpoint the strengths and weaknesses of your company from the customer's viewpoint.
Reducing the Raw Linguistic Analysis Data
Let's look at a specific example to gain an understanding of what is involved in organizing and presenting the results of linguistic analysis. As I described in Linguistic Analysis Explained, sentence parsing is one tool in the NLP analytics bag of tricks. Google has a very well regarded sentence parser, and it is available via an API. You can try it out at https://cloud.google.com/natural-language/. It's a fun way to get some insight into the complexities of presenting the results of linguistic analysis. Try running this comment through the API:
The support staff are helpful, but slow to respond.
Now and take a look at the Syntax results. These are the results of the sentence parser. You find:
Wow, that's a lot of data, but hardly actionable insight! You can see that the sentence has been broken into tokens (words and punctuation). Each token has been tagged with its part of speech (slow is an adjective). Each token has also been assigned a parse label, indicating how it is used in the sentence (slow is used as a conjunction). The green arrows show how the token are interrelated. Imagine how much data would be generated by running 10,000 customer comments through the sentence parser!
The job of text analytics is to distill this pile of data down. In this case the analysis might go something like this:
- "Support staff" is the nominal subject of the root verb. That's a topic the customer has mentioned.
- "Helpful" is an adjectival complement (acomp) of the subject. The customer has said the staff are helpful.
- "Support staff" is further described by the customer by the coordinating conjunction (cc) "but", as "slow" (conj).
So we find that the support staff are both helpful and slow. We have extracted a topic of "support staff", with expressions of "helpful" and "slow" associated with it. This reduction of the raw data from linguistic analysis has resulted in what we are interested in knowing. Our customer thinks the support staff is helpful, but also slow! This is a single finding from linguistic analysis.
Presentation of the Reduced Data
Now that we have extracted the desired nuggets from the raw linguistic analysis data we need to present it in a way that helps you find the insights you seek. An actual analysis of 10,000 customer comments may well produce 50,000 findings. To navigate these the findings need to be organized in a way that emphasises the important insights, and allows you to explore the findings quickly and intuitively. A good text analytics tools will assist you in ways such as:
- Grouping similar topics, preferably automatically or with a custom taxonomy you control.
- Ranking topics by frequency of occurrence or other metrics such as sentiment.
- Allowing you to filter the results in various ways, such as by demographic data.
- Showing trends across time or other variables.
Summary
Text analytics for customer experience management reduces and organizes the results of linguistic analysis. If done well, the results are presented to you, the user, such that you can find actionable insights quickly, and explore the data to help formulate an action plan for improvement.
8/15/18
Read more
Text Analytics & AI
Taxonomies in Text Analytics
We are going to talk about taxonomies as used for text analytics. But to get to that, let’s first look at what we mean by a taxonomy. In general terms, taxonomies classify things. You may remember this from biology classes. Biologists classify humans into primates, mammals, and then the animal kingdom (skipping a few levels). But you get the idea. A human is a primate, a primate is a mammal, and a mammal is an animal. Taxonomies are an “is-a” hierarchy. Each level belongs to the next level up with an “I am an instance of” relationship.Taxonomies are an important tool in natural language processing in general, and in text analytics in specific.While can have several levels: Animal => Mammal => Primate => Human, in text analytics it is common to have only two levels. For example, we could have a taxonomy that looks like this:
- Government agency
- Internal Revenue Service
- United States Treasury
- Office of Management and Budget
- Commercial enterprise
- Apple
- IBM
- Tesla
- Non-Profit
- American Red Cross
- Sierra Club
- Doctors without borders
This taxonomy classifies organizations by type. In the realm of text analytics for customer experience management we are more likely to want to classify specific topics into more general topics. We can think of this as placing topics into groups. Supposing we are an airline company, we might want a taxonomy like this:
- Baggage
- Bag
- Luggage
- Suitcase
- In-flight technology
- Wi-Fi
- Video
- Movies
- Airport amenities
- Lounge
- Gate
- Tram
The intent of the taxonomy is to group customer mentions so that we can consider them as a whole. Supposed we used text analytics to find the topics mentioned in an airline customer satisfaction survey. The taxonomy above could be used to group the topics into the general areas we want to consider together.This application of the concept of a taxonomy is really a way of grouping topics into broader concepts that make sense for our particular business. It is a business domain specific method of grouping or aggregating. It is a powerful technique for tailoring text analytics findings to a specific business vertical.When used in this way we can view the taxonomy as a list of groups, each of which has one or more synonyms. The taxonomy maps synonyms into groups.As you can imagine from the brief examples, real taxonomies can become very large. There are techniques to manage this. First, some software products are capable of creating taxonomies automatically. These use algorithms that examine the results of the text analytics and attempt to create taxonomies suitable for the analyzed text. Second, the taxonomy may allow more powerful means of specifying synonyms than simple text matching. For example, the software may allow the use of regular expressions to specify synonyms.
8/15/18
Read more
Text Analytics & AI
Ascribe and Google Surveys: Better decisions with market research
By Google Surveys, Published on Google.com on June 23, 2017Ascribe, a Google Surveys Partner, is featured on the Google Analytics Solutions Success Stories page with an overview of how the partnership led to innovation and expansion of access to consumer insights for researchers.Read the full feature
2/20/18
Read more
Text Analytics & AI
Measuring & Understanding Sentiment Analysis Score
Editor’s note: This post was originally published on Ascribe in October 2020 and has been updated to reflect the latest data.
Sentiment analysis (or opinion mining) is used to understand the emotion or sentiment behind comments and text, allowing data analysts to gain actionable insights from verbatim comments. While measuring and understanding sentiment analysis scores is more involved than analyzing closed questions, it offers a valuable source of metric data.
What Does Sentiment Mean?
Audiences will have opinions on products, services, and more, that are either positive, negative, or neutral in tone. Companies can use this information to better understand the feedback given by audiences on products or how effective or ineffective messaging has been. Sentiment analysis provides your business with a way to quantify these emotions to discover the overall answer polarity and insights into your customer feedback.
Because customer sentiment is provided in the person’s voice, and not based on a set response or keywords, you need a way for your computers to understand it. Natural Language Processing (NLP), combined with machine learning, allows your sentiment analysis solution to look at a data set and pull more meaning from it. It does this by scoring each response based on whether the algorithm thinks that it’s positive, negative, or neutral.
While applying a sentiment score to the entire response can be useful, it does have problems. For example, would you say that this comment is positive or negative?
The food was great but the service was awful.
More sophisticated sentiment analysis can apply sentiment scores to sections of a response:
Food-great: positive
Service-awful: negative
Analyzing versus Interpreting
While analysis and interpretation are often used interchangeably, they have two different meanings, especially within data science and sentiment analysis work. Interpreting sentiment in a series of responses is more of a qualitative assessment. If you are manually processing verbatim comments to determine the sentiment, your overall sentiment results could contain unique biases and possible errors. With sentiment analysis tools, this bias potential and possible interpretation errors are severely diminished in favor of a faster, automated, analysis program.
Sentiment analysis programs have a standardized approach that gets the same results regardless of the person running the process. It’s difficult to achieve this manually, but computer-aided methods make it possible.
Positive and Negative Keywords
Positive Words
These text analysis words are representative of positive sentiment. However, despite lists like this existing, these words are subject to change and machine learning models are incredibly sensitive to context. This makes the framing of the comment incredibly important. With these machine learning models, however, companies are able to find out what people like about products, and services while highlighting their experiences. This is a good way to see what you’re doing right and areas where you compare favorably to the competition. You can build on these successes as you move forward as a company.
Here are some of these words:
- Acclaim
- Brilliant
- Convenient
- Durable
- Enjoyable
- Ethical
Negative Words
These words are commonly associated with negative sentiment. These sentiments can indicate areas where you’re failing to deliver on expectations. It’s also a good way to see whether a product or service has a widespread problem during a rollout, to identify issues in the customer experience, and to find other areas of improvement that you can prioritize.
Here are some of these words:
- Dishonest
- Failure
- Gruesome
- Hazardous
- Imbalance
- Lackadaisical
Neutral Sentiment Words
Neutral sentiments are driven by context, so it’s important to look at the whole comment. Excelling in the customer experience means going beyond “okay” and moving in a positive direction. These middle-of-the-road sentiments are useful in determining whether your company is noteworthy in a product or service category.
Positive to Negative Comment Ratio
A ratio in sentiment analysis is a score that looks at how negative sentiment comments and positive sentiment comments are represented. Generally, this is represented on a scale of -1 to 1, with the low end of the scale indicating negative responses and the high end of the scale indicating positive responses. You may need to adjust how you evaluate the score to account for trends in your audience as some may be more negative than the standard population. For example, if you were conducting a survey that focused on dissatisfied customers, then you would be dealing with a tone that’s more negative than usual.
What is a Good Sentiment Score?
A good sentiment score depends on the scoring model that you’re using. Set minimum scores for your positive and negative threshold so you have a scoring system that works best for your use case.
How Accurate is Sentiment Analysis?
The accuracy of sentiment analysis depends on the method that you’re using to work with your verbatim comments, the quality of the data that you’ve given the computer, and the subjectivity of the sentiment. You want the most accurate results possible, which typically means that you’ll want to have a computer assisting your researchers with this process. The automated system can reduce the potential for bias and also use a standardized set of rules for going through the data. If there are any problems with accuracy, you can feed more data into the sentiment analysis solution to help it learn what you’re looking for.
What Algorithm is Best for Sentiment Analysis?
The algorithm that works best for sentiment analysis depends on your resources and your business needs. There are three major categories in algorithms: machine learning, lexicon-based, and those that combine both machine learning algorithms and lexicons.
Machine learning is one of the most popular Algorithms in both Data Science and Text Analytics and is an application of artificial intelligence. It allows your sentiment analysis solution to keep up with changing language in real-time. Because data scientists can’t predict when the next shift in colloquial slang and voice will occur and completely change what is negative and what is positive. They’ve begun to use machine learning with operational data provided to understand natural language and current vernacular. This is a core component of sentiment analysis and is an example of supervised learning, where you’re feeding it representative results so it can learn from them. Unsupervised learning refers to machine learning that is not based on data specifically designed to train it. Deep learning refers to the complexity of machine learning, with this moniker usually referring to complex neural networks.
A lexicon-based algorithm relies on a list of words and phrases and whether they’re positive or negative. It’s difficult to update the lexicon with the latest trends in language.
Ascribe’s dedication to Sentiment Analysis
If you are looking to leverage sentiment analysis when analyzing verbatim comments and open-ended text responses to uncover insights and empower decision-making, check out Ascribe’s text analytics offering, CX Inspector.
CX Inspector is a customizable and interactive text analytics tool with compatible APIs and unique machine learning techniques that provide topic and sentiment analysis from verbatim comments automatically. Analyze everything from the success of marketing campaigns, product feedback results, product reviews, social media platform comments, and more.
For a more comprehensive solution for sentiment analysis, use X-Score which is a feature within CX Inspector that provides a sentiment score from open-ended comments. X-Score is a great measure of customer satisfaction, and also identifies the largest drivers of positive and negative sentiment.
Read more