Sentiment analysis by exploiting emojis 🧐 : the example of the covid19 vaccination
With the explosion of smartphone and internet usage, the use of emojis in messages has grown. According to a 2015 report from the research team at Emogi, emojis were used by 92% of the online population. That company even estimated that more than 2.3 trillion messages containing at least one emoji would be sent in 2016. These little images embedded in an email message are not negligible data.
The in-depth analysis of these emojis allows for “opinion mining”, which means sentiment analysis from dematerialized textual sources on large amounts of data.
This is what we did on the subject of covid19 vaccination.
Extraction of relevant data from twitter
For this analysis, we focused on the twitter community. In order to select the tweets related to this theme, we made queries on the following terms :
These queries were deliberately broad in order to have a global and international vision of the subjects on twitter.
Thus, thanks to our extraction tool (coded in python), we were able to extract all the tweets containing at least one of these terms over the period from March 25 to 26. This gives us a corpus of more than 50 000 tweets.
Once the extraction and cleaning is done, it is a matter of focusing only on the emojis.
These are meaningful when put in context, which is why they are particularly interesting for sentiment analysis.
A study from AMU (University of Aix-Marseille) proposes examples of emoji use :
- to add an emotion to a neutral sentence
- to enhance a feeling in an already emotional sentence
- to modify the emotion of a text, for example by making irony, sarcasm etc…
- to give minimal information more quickly than in writing
- for “fun” because the sender finds it amusing
The analysis of emojis presents some biases, notably the subjectivity of the person who interprets the meaning. In our case, it’s the data consultant who has to get rid of this bias, which is why it’s necessary to know the links between emojis in order to cross-reference the information.
The transformation into a graph
The best way to have a global view is to put all this data in the form of a graph, i.e. nodes and links. If you want to learn more about this representation you can read our article dedicated to this subject.
To create it we use a python program with two rules :
- when a new emoji appears a node is created. If the emoji is already present, a value is added to the existing node count
- when several emojis are present in the same tweet, a link is created between them
Once constructed, we use specialised graph processing software to organise the set and use a community detection algorithm.
In our case, we end up with this result.
We can clearly see that this graph is composed of two distinct parts. At the top a part composed of all kinds of emojis, and at the bottom a part composed of flag emojis.
Contextualisation and sentiment analysis
To fully understand the meaning of an emoji, it is essential to put it into context: the context of the application, the conversational context, the social context, etc…
A perfect example is the use of this emoji :
- in the western world it is understood as a sign referring to Italian culture
- in India it is used to ask if someone is hungry
- in the Arab world it is used to call for calm
After using our community detection algorithm again on the top of the graph, we see 3 sub-subject:
- enthusiasm (green)
- scepticism (blue)
- anger/fear (red)
Interestingly, the anger/fear sub-subject is half as large as the other two. The enthusiastic sub-subject is made up of emojis such as 🥂 or 💃, which seem to evoke the idea/possibility of resuming a “normal” life.
In the flag community, we see two large communities and two smaller ones.
To go further, the analysis should be segmented by country.