Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Topics

Topic analysis is undertaken using Natural Language Processing (NLP) techniques and Latent Dirichlet allocation (LDA) models to analyze the phrases spoken by customers across many calls and then group similar or related phrases into separate topics.

Put simply, to uncover the “Voice of the Customer” we:

  • Take the customer side of a conversation that has been generated using our SpeechAI service

  • Identify key phrases within these conversations

  • Automatically group these phrases into clusters or topics

By way of example, calls into a call center may cover multiple topics such as:

  • Technical support

  • Sales

  • Complaints

  • Cancellation

You would expect Technical Support calls to contain more technical support phrases than would a sales call. So, a Technical Support call is likely to contains phrases such as:

  • cannot log in

  • router not working

  • no internet

  • computer doesn’t turn on

You would not expect to see these phrases in a sales call or, if you did, they would occur less frequently than phrases relating to sales, such as “like to purchase”, “interested in”, “credit card”, etc. Hence, by mapping the frequency at which phrases occur in a call and comparing this across all calls, we can group together similar calls into clusters or topics.

Even where a call may cover multiple topics (eg Complaints and Technical Support), we examine how frequent Complaint phrases occur (such as “I like to speak to the manager”, “I wish to complain”, “I’m not happy”, and “this is ridiculous”) and compare this to Technical Support phrases to determine if the call was mainly a Complaint or Technical Support call. In some instances, calls may be included in multiple topics.

This is Big Data at work. We analyze thousands of your calls to discover hidden topics of which you may have been previously unaware.

Across your voice data, we:

  • Identify 10 key topics

  • The top ten phrases within those topics

  • Weight each phrase so that you understand how important or relevant that phrase was in determining the topic

  • List the calls that are included in the topic

A separate report lists the phrases that have been used to generate the topics so that you can examine which phrases have been most prevalent in generating the 10 topics.

Natural Language Processing (NLP)

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Source: wikipedia

Latent Dirichlet allocation (LDA)

In natural language processing, the latent Dirichlet allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's presence is attributable to one of the document's topics. LDA is an example of a topic model and belongs to the machine learning field and in a wider sense to the artificial intelligence field.

Source: wikipedia

Trends

Separate to the Topic analysis is a Trending analysis.

The Trending analysis identifies the most frequent and important words and phrases spoken by your customers. However, it is not as simple as counting the number of times a word is spoken, otherwise words such as “a”, “the”, “at” would be at the top of the list. These former words are referred to as Stop Words; words that do not add much or any meaning to the analysis and can be removed (or “stopped”) from being included.

After stop words are removed, we:

  • Count the frequency of each word

  • Identify 2 and 3 word phrases (know as n-grams) and count the frequency of each phrase

  • List the top 100 words and top 100 phrases

  • For each word and phrase, provide a list of the calls that include that word or phrase with links to Wordbench

However, sometimes the most frequent words are not necessarily the most important, at least not in terms of identifying trending topics. For example, it is not uncommon to see the company name in the top 10 most frequent words, but it doesn’t really help in analyzing why your customers are calling. Therefore, to identify the most important words and phrases we employ an advanced algorithm known as Term Frequency Inverse Document Frequency (TF-IDF). We’ll do a deeper dive into TF-IDF later in this article. For now, it is suffice to say that TF-IDF is a weighting or score given to each word and phrase to signify its importance in a call and across multiple calls.

Stop Words

Stop words are any word in a stop list (or stoplist or negative dictionary) which are filtered out (i.e. stopped) before or after processing of natural language data (text).[1] There is no single universal list of stop words used by all natural language processing tools, nor any agreed upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose.

Source: wikipedia

n-gram

In the fields of computational linguistics and probability, an n-gram (sometimes also called Q-gram) is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus.

Source: wikipedia

  • No labels