Innovative insights – data analytics powered by A.I.
March Article 2019
Photo by Robert Anasch on Unsplash
Hi everyone. David from rondofiniti here.
Thought I’d write an article on consumer sentiment analysis. Sentiment, in very basic terms, describes how a person feels about a particular subject – generally in the positive or negative sense. When a consumer submits a review for a restaurant, product, airline etc. on a public website, they usually enter an alias, overall review rating (1 to 5 or 1 to 10; the higher the number; the higher the satisfaction), a title and comments. This provides potential consumers with the ability to assess these reviews and make a decision on whether they should purchase a particular product or service. For the supplier of the goods or services, it is obviously in their best interest for customers to submit higher ratings on publicly available websites so as to further encourage more business.
Data from these publicly available sites can also be used by businesses to determine what the consumer is really looking for and gives them the knowledge to tailor their product/service to meet that need; if they indeed utilise the opportunity this valuable data provides.
For this analysis, we used publicly available data off Trip Adviser and looked at a particular restaurant (which will remain nameless for this investigation). This particular restaurant has been in operation for over a decade and has almost 500 independent reviews.
Using this data, we wanted to test the following hypothesis:
H0: The overall consumer ratings do not match the overall trend of the average sentiments of their respective consumer comments. (i.e.: no linear relationship)
H1: The overall consumer ratings do match the overall trend of the average sentiments of their respective consumer comments. (i.e.: linear relationship)
If we can reject the null hypothesis (i.e. H1 holds true), we don’t necessarily have to rely only on consumer rating values, but we can also utilise the general sentiments extracted from text comments submitted by consumers.
The high-level steps used to formulate the analysis were:
1. Extract the review data from the website and place the data into a comma separated values file.
2. Extract the sentiment of each text entry using a program called ‘R’. For people not familiar with sentiment analysis or data analytic tools, I will give you a very high-level explanation. Words within a sentence are given a positive or negative score based off a dictionary of polarized words (i.e. positive feeling or negative feeling words are given values. The higher the value, the more positive the sentiment and vice versa). Through some clustering together, weighting of values and other functions, the sentence sentiment value (feeling of the entire text) is determined. The higher the value, the more positive the sentiment feeling and vice versa. For you ‘R’ nuts out there, we removed stop words (commonly used terms such as ‘a’, ‘an’, ‘in’ etc.) prior to the analysis.
3. Plot the following data:-3a. Average sentiment of the reviews vs consumer rating values for each reviewer-3b. Average overall review sentiments vs each rating class-Histogram data of review ratings
4. Conduct a regression analyses to confirm or reject the hypothesis. Here we are checking the data for linearity.
There is a large amount of general variability or spread of data (Figure 1) when we consider individual review sentiments versus each rating class. This is to be expected. Reviewers can use vastly different language to describe their feelings in comparison to each other and thus produce a different sentiment score. Also, some reviewers use harsher language to describe the same feelings in comparison to another person, which would lead to more variability or spread.
Furthermore, the histogram (Figure 2) shows that a rating of 4 was the highest rating count chosen by consumers, whilst a rating of 1 was the lowest rating count chosen. You could say it’s a pretty good restaurant at face value.
When we consider the ‘average overall sentiment’ of all consumers (Red Crosses) per each rating class (Figure 1), the general trend seems to be positively linear (the higher the rating, the higher the reviewer text sentiment value). Sorry to throw some statistical analysis into the article, but from the regression results (checking how linear the data is) shown in Table 1, we can say:
• The Overall model is significant F = 326, so we reject H0: p = 0.00 < 5%. • The ‘Rating’ coefficient is significant reject H0, p = 0.00 < 5%. • 99% of the variation in ‘average overall sentiment’ is described by the variation in consumer rating.
Hence it is plausible to reject the null hypothesis and conclude that the ‘average overall sentiment’ is indeed positively and linearly proportional to the overall consumer ratings. In other words, we can statistically say, that the rating values consumers provide (1 to 5) is generally mirrored by the sentiment of the text comments they include in a review.
Yes: Consumer text reviews do matter!
Only considers one restaurant – should be expanded to many more.
This is a very broad directional analysis.
Sentiment variability could be reduced if there was a way to standardise the language used between each reviewer.
Also, surveys can be seen to be not very representative of reality if:
Only happy or angry people do them.
People who think they have a good idea to impart do them.