Tag: emojis

November 17, 2018

Emoji Grammar as Beat Gestures

Emoji Grammar as Beat Gestures

If you’re a Lingua Bish, you probably know about celebrity linguists Dr. Gretchen McCulloch😻 and Dr. Lauren Gawne 😻. In their presentation at the 1st International Workshop on Emoji Understanding and Applications in Social Media in June (2018), they presented their research to answer the question once and for all, Are emojis a language 🤔? But actually, Gretchen and Lauren always use emoji as the plural for emojis, (bishes don’t) and their research question was “If languages have grammar and emoji are supposedly a language, then what is their grammar?”

If you try to compare emojis to language, the closest you’ll get is word units. Of all the bits of a language, emojis are most similar to words, but language is so much more than a bunch of words. It has parts of speech and structure (and so many other things). Emojis often affect the tone of text or add a layer of emotion😏, but Lauren and Gretchen think that’s just a small part of it because their effect isn’t always straightforward. To compare emojis to words, they decided to look at the most used word sequences and compare them to the most used emoji sequences. They hypothesized that if emoji sequences are repeated they should be considered “beat” gestures, but what is that even?

Beat Gestures and Emojis

So gestures are a different type of communication🖐. They are not a language and they don’t have grammar.

One type of gesture is the “beat” gesture. It is characterized by its absence of meaning and its repetitive nature. You use beat gestures when you talk with your hands👐 and most gestures politicians make during speeches are beat gestures.

However, when a really cool person bobs their open palms up and down in the air above their head, you know it means “raise the roof”, so this is not a beat gesture. It seems like emojis act the same way as beat gestures, often repetitive and often with no inherent meaning unless accompanied by words🤯.

The Emoji Corpus

Gretchen and Lauren used a SwiftKey emoji corpus to check out sequences of two, three, and four emojis. That means that they looked for groups of emojis that often appear together. They looked for the 200 most common sequences and noticed that the top sequences used just one repeated emoji. These were the top 10 sequences in the SwiftKey emoji corpus:

The Word Corpus

Then they used the Corpus of Contemporary American English (COCA) to check out word sequences to compare to the emoji sequences. The COCA contains around 500 million words from things like news outlets and websites👩‍💻. In the 200 most common word sequences, they found almost no repetition. The only time words were repeated, were in the cases of “had had” and “very very very.” However, these didn’t even make the top 200. And yes, that could just be because the COCA is formal and perhaps a corpus of informal language would have yielded different results. For example you might get instances of what linguists call the ‘salad-salad reduplication’ (2004) as in “it’s salad salad🥗, not ham salad or jello salad”. It’s the same as “OMG you like like them 😲??” or “It’s Saturday. Tonight I’m going out out💃,” but this bish is digressing.

Comparing Words to Emojis

The point is, where words are very rarely repeated in a sequence, it appears that emojis are. You’re probably like, “but I send 2-4 emojis at a time and they don’t repeat.” Ya, you might, but I bet they’re pretty similar like 5 different hearts💝💘💖💗💓, or the hear-no-evil monkeys🙈🙉🙊, or allll the dranks🍾🍹🍸🥃🍷🥂🍺. So ya, sometimes they’re all different, but if so, they’re likely on a theme.

But even though emojis can be more repetitive than speech or writing, most emojis occur next to words and not in sequences. Even where emojis occur without words, it’s mostly just one or two at a time and usually in response to a previous message. Guess who else usually partners with words? You guessed it, beat gestures👊!

It seems like emojis and beat gestures have a lot in common. Let’s list the ways:

no grammatical structure
no inherent meaning unless accompanied by words
often repeated
often add emphasis

Maybe emojis and beat gestures should get a room already 👉👌😜.

Conclusion

Basically the idea is just to shift the way we think of emojis. Thinking of them as a new language with grammar won’t get research far. Gretchen and Lauren might be on to something by considering emojis to be a type of gesture. Emojis don’t have their own grammar, but they work with our written grammar. They add emphasis, just like beat gestures do with our spoken grammar. So, it’s unlikely that emojis can ever be a full language. If they ever start exhibiting structural regularities in corpus studies though, and start languagifying, I’m sure Gretchen and Lauren will be there to catch it.

This paper is great for emoji bishes👯‍, anyone who texts📱, corpus bishes, and lingthusiasts👸🏻👸🏿👸🏼👸🏾.

——————————————————————————————————–

In: S. Wijeratne, E. Kiciman, H. Saggion, A. Sheth (eds.): Proceedings of the 1st International Workshop on Emoji Understanding and Applications in Social Media (Emoji2018), Stanford, CA, USA, 25-JUN-2018, published at https://ceur-ws.org

Ghomeshi, Jila, et al. “Contrastive Focus Reduplication in English (The Salad-Salad Paper).” Natural Language & Linguistic Theory, vol. 22, no. 2, 2004, pp. 307–357., doi:10.1023/b:nala.0000015789.98638.f9.

By Caitlin O'Connell

May 4, 2018

Are Emojis Predictable?

High Brow Posts

Emojis are cool, right? Well typing that sure didn’t feel cool, but whatever. The paper “Are Emojis Predictable?” by Francesco Barbieri, Miguel Ballesteros, Horacio Saggion explores the relationships between words and emojis by creating robot-brains that can predict which emojis humans would use in emoji-less tweets.

But, what exactly are emoji (also is the plural, emoji, or emojis?) and how do they interact with our text messaging? Gretchen McCulloch says you can think about them like gestures. So if I threaten you by dragging a finger across my throat IRL, a single emoji of a knife might do the trick in a text. But if they act like gesture in some cases, what are we to make of the unicorn emoji? Or the zombie? It‘s not representative of eating brains right? Right?? Tell me the gesture isn’t eating brains!

So, obviously, trying to figure out what linguistic roles emoji can play is tough and it doesn’t help that they haven’t been studied all that much from an Natural Language Processing (NLP) perspective. Not to mention the perspective of AI. Will emoji robots take over the world like that post-apocalyptic dystopian hellscape depicted in movies like… the Emoji Movie and…Lego Batman? Studying emojis will not only protect us from the emoji-ocalypse, but also help analyze social media content and public opinion. That’s called sentiment analysis btw, but more on all things I just tried to learn later.

The Study (or Machine Learning Models, oh my 😖)

For this study, the researchers (from my alma mater, Universitat Pompeu Fabra) used the Twitter APIs to determine the 20 most frequently used emojis from 40 million tweets out of the US between Oct 2015 and May 2016. Then they selected only those tweets that had a single emoji from the top 20 list. It was more than 584600 tweets. Then they removed the emoji from the tweet and trained machine learning models to predict which it was. Simple, right?

Now just to be clear, the methods in this study are way above my head. I don’t want anyone confusing me for someone who understands exactly what went on here because I was fully confused through the entire methods section. I tried to summarize what little understanding I think I walked away with, but found there was just way too much content. So here is a companion dictionary of terms for the most computationally thirsty bishes (link).

So actually two experiments were performed. The first was comparing the abilities of different machine learning models to predict which emoji should accompany a tweet. And the second was comparing the performance of the best model to human performance.

The Robot Face-Off (🤖 vs 🤖)

In the first experiment, the researchers removed the emoji from each tweet. Then they used 5 different models (see companion dictionary for more info) to predict what the emoji had been:

A Bag of Words model
Skip-Gram Average model
A bidirectional LSTM model with word representations
A bidirectional LSTM model with character-based representations
A skip-gram model trained with and without pre-trained word vectors

They found that the last three (the neural models) performed better than the first two (the baselines). From this they drew the conclusion that emoji collocate with specific words. For example, the word love collocates with ❤. I’d also like to take a moment to point out this study which points out the emojis are mostly used with words and not to replace them. So we’re more likely to text “I love you ❤” than “I ❤ you.”

The Best “Robot”

The best performing model was the char-BLSTM with pretrained vectors on the 20-emojis. Apparently frequency has a lot to do with it. It shouldn’t be surprising that the model predicts the most frequent emojis more frequently. So in a case where the word love is used with the 💕, the model would prefer ❤. Also the model confuses emojis that are used in high frequency and varied contexts. 😂 and 😭 are an example of this. They’re both used in contexts with a lot of exclamation points, lols, hahas, and omgs and often with irony.

The case of 🎄 was interesting. There were only 3 in the test set and the model correctly predicted it in the two occasions where the word Christmas was in the tweet. The one case without it didn’t get the correct prediction from the model.

Second experiment: 🙍🏽vs 🤖

The second experiment was to compare human performance to the character-based representation BLSTM. These humans were asked to read a tweet with the emoji removed and then to guess which emoji of five emojis (😂, ❤, 😍, 💯, 🔥and ) fit.

They crowdsourced it. And guess what? The char-BLSTM won! It had a hard time with 😍 and 💯 and humans mainly messed up 💯 and 🔥. For some reasons, humans kept putting in 🔥 where it should have been 😂. Probably the char-BLSTM didn’t do that as much because of its preference for high frequency emojis.

5 Conclusion

The BLSTMs outperformed the other models and the humans. Which sounds a lot like a terminator-style emoji-ocalypse to me. This paper not only suggests that an automatic emoji prediction tool can be created, but also that it may predict emojis better than humans can and that there is a link between word sequences and emojis. But because different communities use them differently and because they’re not usually playing the role of words necessarily, it’s excessively difficult to define their semantic roles not to mention their “definitions.” And while there are some lofty attempts (notably Emojipedia and The Emoji Dictionary) to “define” them, the lack of consensus makes this basically impossible for the vast majority of them.

I recommend this article to emoji kweens, computational bishes 💻, curious bishes 🤔, and doomsday bishes 🧟‍♀️.

Thanks to Rachael Tatman for her post “How do we use Emoji?” for bringing some great research to our attention. If you don’t have the stomach for computational methods, but care about emojis, then definitely check out her post.

Barbieri, Francesco, et al. “Are Emojis Predictable?” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, doi:10.18653/v1/e17-2017.

Dürscheid, C., & Siever, C. M. (2017). “Beyond the Alphabet–Communication of Emojis” Kurzfassung eines (auf Deutsch) zur Publikation eingereichten Manuskripts.

Tatman, Rachael. “How Do We Use Emoji?” Making Noise & Hearing Things, 22 Mar. 2018, makingnoiseandhearingthings.com/2018/03/17/how-do-we-use-emoji/.

By Caitlin O'Connell

May 4, 2018

Companion to “Are Emojis Predictable?”

Low Brow Posts

Welcome to the companion to

Are Emojis Predictable?

by Francesco Barbieri, Migual Ballesteros, and Horacio Saggion.

This is where I’ve attempted to provide some semblance of explanation for the methods of the study. Look, I tried my best with this, so don’t judge. I ordered it in terms of the difficulty I had instead of alphabetically. References at the end for thirsty bishes who just can’t get enough.

Difficulty	NLP Model or Term
	Sentiment Analysis A way of determining and categorizing opinions and attitudes in a text using computational methods. Also opinion mining.
	Neural Network A computer network that’s based on how the human brain works.
	Recurrent Neural Network A type of neural network that at can be trained by algorithms and that stores information to make context-based predictions. Also RNN.
	Bag of Words A neural network that basically counts up the number of instances of words in a text. It’s good at classifying texts by word frequencies, but because it determines words by the white space surrounding them and disregards grammar and word order, phrases lose their meaning. Also BoW.
	Skip Gram A neural network model does the opposite of the BoW. Instead of looking at the whole context, the skip gram considers word pairs separately. It’s trying to predict the context from a word, so it weighs closer words more than further ones. So the order of words is actually relevant. Also Word2Vec.
	Long Short-term Memory Network A recurrent neural network that can learn the orders of items in sequences and so can predict them. Also LSTM.
	Bidirectional Long Short-term Memory Network The same as above, but it’s basically time travel because half the neurons are searching backwards and half are searching forwards even if more items are added later. Also BLSTM.
	Char-BLSTM A character-based approach that learns representations for words that look similar, so it can handle alternatives of the same word type. More accurate than the word-based variety.
	Word-BLSTM Some kind of word-based variant of the above? Probably?
	Word Vector Ya, this one is umm… well, you see, it has magnitude and direction. And like, you have to pre-train it. So… “Fuel your lifestyle with .”

Congratulations if you’ve made it this far! You probably already know more than me. Scream it out. I know I did 🙂

REFERENCES

Bag of Words (BoW) – Natural Language Processing, ongspxm.github.io/blog/2014/12/bag-of-words-natural-language-processing/.

Britz, Denny. “Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs.” WildML, 8 July 2016, www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/.

Brownlee, Jason. “A Gentle Introduction to Long Short-Term Memory Networks by the Experts.” Machine Learning Mastery, 19 July 2017, machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/.

Brownlee, Jason Brownlee. “A Gentle Introduction to the Bag-of-Words Model.” Machine Learning Mastery, 21 Nov. 2017, machinelearningmastery.com/gentle-introduction-bag-words-model/.

Chablani, Manish. “Word2Vec (Skip-Gram Model): PART 1 – Intuition. – Towards Data Science.” Towards Data Science, Towards Data Science, 14 June 2017, towardsdatascience.com/word2vec-skip-gram-model-part-1-intuition-78614e4d6e0b.

Verwimp, et al. “Character-Word LSTM Language Models.” [1402.1128] Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition, Cornell University Library, 10 Apr. 2017, arxiv.org/abs/1704.02813.

Colah, Christopher. “Understanding LSTM Networks.” Understanding LSTM Networks — Colah’s Blog, colah.github.io/posts/2015-08-Understanding-LSTMs/.

Nielsen. “Neural Networks and Deep Learning.” Neural Networks and Deep Learning, Determination Press, 1 Jan. 1970, neuralnetworksanddeeplearning.com/chap1.html.

“Sentiment Analysis: Concept, Analysis and Applications.” Towards Data Science, Towards Data Science, 7 Jan. 2018, towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17.

gk_. “Text Classification Using Neural Networks – Machine Learnings.” Machine Learnings, Machine Learnings, 26 Jan. 2017, machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6.

Thireou, T., and M. Reczko. “Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, 2007, pp. 441–446., doi:10.1109/tcbb.2007.1015.

“Vector Representations of Words | TensorFlow.” TensorFlow, www.tensorflow.org/tutorials/word2vec.

“Word2Vec Tutorial – The Skip-Gram Model.” Word2Vec Tutorial – The Skip-Gram Model · Chris McCormick, mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/.

By Caitlin O'Connell