Are Emojis Predictable?

Emojis are cool, right? Well typing that sure didn’t feel cool, but whatever. The paper “Are Emojis Predictable?” by Francesco Barbieri, Miguel Ballesteros, Horacio Saggion explores the relationships between words and emojis by creating robot-brains that can predict which emojis humans would use in emoji-less tweets.

But, what exactly are emoji (also is the plural, emoji, or emojis?) and how do they interact with our text messaging? Gretchen McCulloch says you can think about them like gestures. So if I threaten you by dragging a finger across my throat IRL, a single emoji of a knife might do the trick in a text. But if they act like gesture in some cases, what are we to make of the unicorn emoji? Or the zombie? It‘s not representative of eating brains right? Right?? Tell me the gesture isn’t eating brains!

So, obviously,  trying to figure out what linguistic roles emoji can play is tough and it doesn’t help that they haven’t been studied all that much from an Natural Language Processing (NLP) perspective. Not to mention the perspective of AI. Will emoji robots take over the world like that post-apocalyptic dystopian hellscape depicted in movies like… the Emoji Movie and…Lego Batman? Studying emojis will not only protect us from the emoji-ocalypse, but also help analyze social media content and public opinion. That’s called sentiment analysis btw, but more on all things I just tried to learn later.

The Study (or Machine Learning Models, oh my 😖)

For this study, the researchers (from my alma mater, Universitat Pompeu Fabra) used the Twitter APIs to determine the 20 most frequently used emojis from 40 million tweets out of the US between Oct 2015 and May 2016. Then they selected only those tweets that had a single emoji from the top 20 list. It was more than 584600 tweets. Then they removed the emoji from the tweet and trained machine learning models to predict which it was. Simple, right?

Now just to be clear, the methods in this study are way above my head. I don’t want anyone confusing me for someone who understands exactly what went on here because I was fully confused through the entire methods section. I tried to summarize what little understanding I think I walked away with, but found there was just way too much content. So here is a companion dictionary of terms for the most computationally thirsty bishes (link).

So actually two experiments were performed. The first was comparing the abilities of different machine learning models to predict which emoji should accompany a tweet. And the second was comparing the performance of the best model to human performance.

The Robot Face-Off (🤖 vs 🤖)

In the first experiment, the researchers removed the emoji from each tweet. Then they used 5 different models (see companion dictionary for more info) to predict what the emoji had been:

  1. A Bag of Words model
  2. Skip-Gram Average model
  3. A bidirectional LSTM model with word representations 
  4. A bidirectional LSTM model with character-based representations 
  5. A skip-gram model trained with and without pre-trained word vectors

They found that the last three (the neural models) performed better than the first two (the baselines). From this they drew the conclusion that emoji collocate with specific words. For example, the word love collocates with ❤. I’d also like to take a moment to point out this study which points out the emojis are mostly used with words and not to replace them. So we’re more likely to text “I love you ❤” than “I ❤ you.”


The Best “Robot”

The best performing model was the char-BLSTM with pretrained vectors on the 20-emojis. Apparently frequency has a lot to do with it. It shouldn’t be surprising that the model predicts the most frequent emojis more frequently. So in a case where the word love is used with the 💕, the model would prefer ❤. Also the model confuses emojis that are used in high frequency and varied contexts. 😂 and 😭 are an example of this. They’re both used in contexts with a lot of exclamation points, lols, hahas, and omgs and often with irony.

The case of 🎄 was interesting. There were only 3 in the test set and the model correctly predicted it in the two occasions where the word Christmas was in the tweet. The one case without it didn’t get the correct prediction from the model.

Second experiment: 🙍🏽vs 🤖

The second experiment was to compare human performance to the character-based representation BLSTM. These humans were asked to read a tweet with the emoji removed and then to guess which emoji of five emojis (😂, ❤, 😍, 💯, 🔥and ) fit.

They crowdsourced it. And guess what? The char-BLSTM won! It had a hard time with 😍 and 💯 and humans mainly messed up 💯 and 🔥. For some reasons, humans kept putting in 🔥 where it should have been 😂. Probably the char-BLSTM didn’t do that as much because of its preference for high frequency emojis.

5 Conclusion

  • The BLSTMs outperformed the other models and the humans. Which sounds a lot like a terminator-style emoji-ocalypse to me. This paper not only suggests that an automatic emoji prediction tool can be created, but also that it may predict emojis better than humans can and that there is a link between word sequences and emojis. But because different communities use them differently and because they’re not usually playing the role of words necessarily, it’s excessively difficult to define their semantic roles not to mention their “definitions.” And while there are some lofty attempts (notably Emojipedia and The Emoji Dictionary) to “define” them, the lack of consensus makes this basically impossible for the vast majority of them.

I recommend this article to emoji kweens,  computational bishes 💻, curious bishes 🤔, and doomsday bishes 🧟‍♀️.

Thanks to Rachael Tatman for her post “How do we use Emoji?” for bringing some great research to our attention. If you don’t have the stomach for computational methods, but care about emojis, then definitely check out her post.



Barbieri, Francesco, et al. “Are Emojis Predictable?” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, doi:10.18653/v1/e17-2017.

Dürscheid, C., & Siever, C. M. (2017). “Beyond the Alphabet–Communication of Emojis” Kurzfassung eines (auf Deutsch) zur Publikation eingereichten Manuskripts.

Tatman, Rachael. “How Do We Use Emoji?” Making Noise & Hearing Things, 22 Mar. 2018,

You May also Like...

The public representation of homosexual men in seventeenth-century England – a corpus based view
June 28, 2018
“Building a thick skin for each other” The use of ‘reading’ as an interactional practice of mock impoliteness in drag queen backstage talk
June 22, 2018
‘First things first, Im the realest’: Linguistic appropriation, white privilege, and the hip-hop persona of Iggy Azalea
January 1, 2018

Leave a Reply

Your email address will not be published. Required fields are marked *