Author: Caitlin O'Connell

Emoji Grammar as Beat Gestures

Emoji Grammar as Beat Gestures

If you’re a Lingua Bish, you probably know about celebrity linguists Dr. Gretchen McCulloch😻 and Dr. Lauren Gawne 😻. In their presentation at the 1st International Workshop on Emoji Understanding and Applications in Social Media in June (2018), they presented their research to answer the question once and for all, Are emojis a language 🤔? But actually, Gretchen and Lauren always use emoji as the plural for emojis, (bishes don’t) and their research question was “If languages have grammar and emoji are supposedly a language, then what is their grammar?”

If you try to compare emojis to language, the closest you’ll get is word units. Of all the bits of a language, emojis are most similar to words, but language is so much more than a bunch of words. It has parts of speech and structure (and so many other things). Emojis often affect the tone of text or add a layer of emotion😏, but Lauren and Gretchen think that’s just a small part of it because their effect isn’t always straightforward. To compare emojis to words, they decided to look at the most used word sequences and compare them to the most used emoji sequences. They hypothesized that if emoji sequences are repeated they should be considered “beat” gestures, but what is that even?

Beat Gestures and Emojis

So gestures are a different type of communication🖐. They are not a language and they don’t have grammar. 

a beat gesture and definitely cool

One type of gesture is the “beat” gesture. It is characterized by its absence of meaning and its repetitive nature. You use beat gestures when you talk with your hands👐 and most gestures politicians make during speeches are beat gestures.

not cool and not a beat gesture

However, when a really cool person bobs their open palms up and down in the air above their head, you know it means “raise the roof”, so this is not a beat gesture. It seems like emojis act the same way as beat gestures, often repetitive and often with no inherent meaning unless accompanied by words🤯.

The Emoji Corpus

Gretchen and Lauren used a SwiftKey emoji corpus to check out sequences of two, three, and four emojis. That means that they looked for groups of emojis that often appear together. They looked for the 200 most common sequences and noticed that the top sequences used just one repeated emoji. These were the top 10 sequences in the SwiftKey emoji corpus:

The Word Corpus

Then they used the Corpus of Contemporary American English (COCA) to check out word sequences to compare to the emoji sequences. The COCA contains around 500 million words from things like news outlets and websites👩‍💻. In the 200 most common word sequences, they found almost no repetition. The only time words were repeated, were in the cases of “had had” and “very very very.” However, these didn’t even make the top 200. And yes, that could just be because the COCA is formal and perhaps a corpus of informal language would have yielded different results. For example you might get instances of what linguists call the ‘salad-salad reduplication’ (2004) as in “it’s salad salad🥗, not ham salad or jello salad”. It’s the same as “OMG you like like them 😲??” or “It’s Saturday. Tonight I’m going out out💃,” but this bish is digressing. 

Comparing Words to Emojis

The point is, where words are very rarely repeated in a sequence, it appears that emojis are. You’re probably like, “but I send 2-4 emojis at a time and they don’t repeat.” Ya, you might, but I bet they’re pretty similar like 5 different hearts💝💘💖💗💓, or the hear-no-evil monkeys🙈🙉🙊, or allll the dranks🍾🍹🍸🥃🍷🥂🍺. So ya, sometimes they’re all different, but if so, they’re likely on a theme.

But even though emojis can be more repetitive than speech or writing, most emojis occur next to words and not in sequences. Even where emojis occur without words, it’s mostly just one or two at a time and usually in response to a previous message. Guess who else usually partners with words? You guessed it, beat gestures👊! 

It seems like emojis and beat gestures have a lot in common. Let’s list the ways: 

  1. no grammatical structure
  2. no inherent meaning unless accompanied by words
  3. often repeated
  4. often add emphasis

Maybe emojis and beat gestures should get a room already 👉👌😜.

Conclusion

Basically the idea is just to shift the way we think of emojis. Thinking of them as a new language with grammar won’t get research far. Gretchen and Lauren might be on to something by considering emojis to be a type of gesture. Emojis don’t have their own grammar, but they work with our written grammar. They add emphasis, just like beat gestures do with our spoken grammar. So, it’s unlikely that emojis can ever be a full language. If they ever start exhibiting structural regularities in corpus studies though, and start languagifying, I’m sure Gretchen and Lauren will be there to catch it.

This paper is great for emoji bishes👯‍, anyone who texts📱, corpus bishes, and lingthusiasts👸🏻👸🏿👸🏼👸🏾.

——————————————————————————————————–

In: S. Wijeratne, E. Kiciman, H. Saggion, A. Sheth (eds.): Proceedings of the 1st International Workshop on Emoji Understanding and Applications in Social Media (Emoji2018), Stanford, CA, USA, 25-JUN-2018, published at https://ceur-ws.org

Ghomeshi, Jila, et al. “Contrastive Focus Reduplication in English (The Salad-Salad Paper).” Natural Language & Linguistic Theory, vol. 22, no. 2, 2004, pp. 307–357., doi:10.1023/b:nala.0000015789.98638.f9.

Read More

Prospects and Challenges of Short-Term Historical Lexicography

My favorite publication is American Speech, a quarterly journal published by Duke University Press. Yes, it’s a little Anglo-centric, but it has my favorite recurring feature Among the New Words. I developed a very close relationship to this feature through my master’s thesis when I used it to comb through and analyze 10 years’ worth of “new words”. That’s around 2500 words and it was an arduous, tedious, fantastic dictionary wonderland that was totally the best and the worst.

Among the New Words, hereafter to be referred to as ATNW, has the lofty mission of documenting new words and uses of words in real time. It is a totally non-traditional style of lexicography. It’s been running regularly since 1941 but had different incarnations as early as 1937. In its nearly 80 years, ATNW has gone from reader-submissions to the internet age. Ben Zimmer and Charles E. Carson decided to look at ATNWs history and consider its future in the most exciting paper I’ve read all year: Prospects and Challenges of Short-Term Historical Lexicography (2018).

How it all started

In 1933 an English teacher slash Jewish immigrant (slash, from his awesome name, I can only assume refugee from Mordor), Isidor Colodny, started publishing a monthly magazine called Words: A periodical devoted to the study of the origin, history, and etymology of English words. I guess this is the kind of thing people did before Instagram. A couple years later, Isidor (Lord of the 8th ring probably) enlisted Dwight Bolinger, a Spanish Teacher with a Ph.D. to write a column called “The Living Language”.

Bolinger noted a very important part of word collection in his introduction to the very first column. He pointed out that new words are often

“…transitory, so that they leave no mark upon the dictionary; and even those which are fortunate enough to make their way into that solemn repository are usually not recorded in such a way as to show just how they came into being, what was their original context, what suggestive power they may have had aside from their literal meaning…”

Which was basically the premise of my whole thesis btdubs. Also, “that solemn repository” is totally what I’m calling dictionaries from now on.

So Bolinger’s original method for The Living Language was to have readers submit new words and words they found to be used in new ways. They were also asked to include information about coinages (unrealistic goals much?). Even with modern resources, we can’t usually accomplish that. Zimmer and Carson use Bolinger’s entry for “hootenanny” as an example of the difficulties of dating coinages pre-internet. Bolinger dated its first use as 1935, but internet tells us it was used as early as 1906.

Nevertheless, his column reached a broad audience including co-founder of American Speech, H. L. Mencken. He was invited to join and renamed his column Among the New Words in 1941. A man before his time, he eschewed traditional domestic American life for an international, 3D immersive, freelance experience teaching in Costa Rica and performing his American Speech duties remotely.

Bolinger’s neologism spotting skills were on point. He wrote about -worthy from jump. He noticed that we had gone from seaworthy and trustworthy to all kinds of new worthies like newsworthy, courtworthy, and credit worthy par example. That was 1941. Now we have such beauties as Oscar-worthy, cringe-worthy, and meme-worthy.  

Another thing he got right was that we create new words by pronouncing onomatopoeia aswords. He noted ahem and tisk. And that’s totally carried on. Just think of nom nom.

How it all changed

Bolinger passed the torch in 1944 and ATNW met a series of new editors. For the publication’s 50th anniversary, Adele Algeo and her husband John (who were running ATNW at the time) produced a commemorative edition with an overview of the different processes of documenting new words that had been used. Inspired by this, its editors (also Zimmer and Carson + Solomon) did another  retrospective for its 75th anniversary.

A lot of methods were used over the years. There was a lot of reading, and submissions by readers in the beginning. In 1997 Wayne Glowka chose the “ask the kids” method by roping in his undergrad students for credit. Also, in the 90s there was this amazing new method created. It was called “electronic database searching.” So, I don’t know… Encarta perhaps? And since 2009, or “The Year of the Tweet” as I call it, access to language changed. The inundation of language from all social media platforms has made tracking neologisms less a matter of collection and more matter of curation (Zimmer and Carson 2018).

Another cool update is that the publication went digital in 2010. So now when describing a new word, writers can include links to digital media like TV, speeches, music videos, and memes.

The challenges

More access to IRL language use is awesome, but it’s also mo’ words mo’ problems. Ya gotta have a system for using search engines and determining what’s real and what’s just a google algorithm. So, let’s talk about ratchet, shall we? It was the American Dialect Society’s word of the year in 2012. ATNW’s initial treatment of it included these four senses:

  1. (insult) adj Over the top, to the extreme, beyond socially acceptable -1999
  2. (insult) n Woman who is ratchet (as in sense 1) -1999
  3. (neutra or positive) n Type of dance in Shreveport, La., or subgroup of rap music associated with the dance -2004
  4. (positive) adj Excellent, wildly fun, exceeding expectations-2007

According to ATNW’s initial entry it all basically started with one kickass grandmother in Shreveport Louisiana. That’s right, innovative wordsmith Anthony Mandigo allegedly used a word he’d learned from his granny as the title of his hot new track to usher in a new style of rap, Ratchet Rap. ATNW speculated that the word could have come from “wretched.”

But wait! After the publishing, a reader found an earlier use of the word (that’s called antedating btw). It was used in its first sense in 1992 song “I’m So Bad” by UGK, a delightful ditty about S-ing one’s own D as far as I can tell. UGK was from Texas. To this day, that’s all ATNW knows.

All of this illustrates that you can’t just do a google search and call it a dictionary. If the ATNW editors were listeners of rap from 1992 Texas, they would have been able to write a much more informed entry.  Clearly, people have been using ratchet since before 1992- UGK didn’t make it up. It also is a lesson on diversity and inclusion because, stop me if I’m wrong, but I have an image in my head of what the editors of ATNW and those solemn repositories have traditionally looked like, what kind of music they’ve listened to, and which regional dialects they’ve used and I’m willing to bet “ratchet” wasn’t in their lexicons.

So, when you conduct your search of “electronic databases” and the like, you need to thoroughly investigate the source (time and place) and look for whoever was producing content at that time. Rarely are words coined out of the blue, so even if you can’t find any more instances of the word, then call a friend. Someone you know knows someone who knows someone from that area. Sherlock the heck out of that shit!  

This article is great for historical linguistics bishes, lexicography bishes, and Ben Zimmer stans. 

—————————————————————————————————————————

Zimmer, Benjamin, and Charles E. Carson. “‘Among the New Words’: The Prospects and Challenges of Short-Term Historical Lexicography.” Dictionaries: Journal of the Dictionary Society of North America, vol. 39, no. 1, 2018, pp. 59–74., doi:10.1353/dic.2018.0010.

Read More

Joining the Western Region: Sociophonetic Shift in Victoria

One thing that’s always bothered me is the lack of language documentation in rural Canada. Studies of Canadian English represent urban areas. And look, I get it: rural Canadians are spread out thinly across the true north strong and free. Most people live in the urban centers and documenting Canada’s rural dialects would be kind of a big deal. But that means that any  claims on “BC English” are about speakers in noVancouver and even though the population of the city is really diverse, linguistics studies there typically aren’t.

Google says it takes about 18 hours to drive from Vancouver to Fort Nelson.

Even if these studies were more diverse, we’d be no closer to understanding how people in, like, Fort Nelson speak.

And the thing about that that bothers me is that something like 20% of the population in Canada lives rurally. We don’t know what they sound like or what they’re saying to each other.

All of that’s a rant for another time, but you might understand how excited I was when I can across  Rebecca Roeder, Sky Onosson, and Alexandra D’Arcy’s paper  (2018) “Joining the Western Region: Sociophonetic Shift in Victoria” which looks at the way some British Columbians who are not Vancouverites talk. While Victoria isn’t exactly rural, it is definitely not Vancouver and sometimes that’s enough.

The Study

The purpose of this study was to try to describe the English in Victoria, BC, something this intrepid trio of scholars has been working on for a long time. They used certain linguistic features to conduct a study of language change over time. The participants were 14-98 years old and from diverse backgrounds. 

Speaking of backgrounds, can anyone guess where the first Final Destination was filmed?

Victoria (slash “hi mom!”)

Victoria is sort of a mini Victorian-era England. It’s on what we call The Island, a short ferry ride or flight from Vancouver.  It’s an isolated city of around 370,000 people. It was a Hudson’s Bay trading post in 1843 and became a city in 1862. It became the capital of the province in 1971. Private school teachers were imported from England right up to WWII setting the bar for the prestige dialect. And just picture this, I said it was on an island, right? Ya, well it didn’t get regular ferry service to the mainland until 1960. Even though there are now people who commute regularly to the mainland, I have met people who have never been off the island. And I didn’t know this, but the particularly British-y area of Victoria is referred to as the “tweed curtain.” It’s a small, wealthy community with a marina and tea shops (RIP The Blethering Place, tea shop of yore). One would think this modern history of isolation would have some effect on the dialect, no? Well yes, apparently there’s some kind of accent there though its features vary and the population that exhibits them is an aging minority.

Methods

The inquisitive trio used the Synchronic Corpus of Victoria English (SCVE), part of the Victoria English Archive which is comprised of 162  interviews with primarily British-descended Victorians. The speakers range from 1st to 6th generation Victorian (14-98 years old). Some were even related.

For anyone still guessing, maybe “Garrick’s Head” rings a bell

 The Sounds

The Sounds Wait what?

Victoria English 

The Canadian Shift

Ok so, the vowels in lit, #blessed, and sass (vowels kit, dress, and trap if you’re new to LinguaBishesare produced at the same height in the mouth for many English dialects. “Height” refers to where your tongue is when you make a sound. In the Canadian Shift, the vowels in kit, dress, and trap started to lower sometime before 1950. This resulted in a really noticable change among baby boomers. The shift slowed down for Torontonians, but if you’re a Canadian woman under 40, then you might be as much as a generation ahead in the shift than guys you know. The Shift is more recent in Victoria, perhaps because of its relative isolation. Even though it started later, the youth speak really similarly to other Canadians, which means there was a whole lotta change in a little bit of time. Also because of this late start, older Victorians have higher vowels than their peers across the country.

Raising of ban and bag

Y’all have probably heard, in Canada, we say bag [bg] not bag [bæg](same with dragon, wagon, and rag). Also, young Canadians in BC and the prairies do the same thing with ban. (See LinguaBishes Vowel Chart) In Victoria, ban and bag are at the same place regardless of age or gender. This shows that it’s probably a solid Victorian feature that’s at least eighty-five years old.

Back-Vowel Fronting

A back-vowel is a vowel that you make with in the back of your mouth. Like in “Karen, that’s some hot goss.” Fronting means making the sounds more towards the front of your mouth. It is very common in British Columbia. Back-vowel fronting is a systemic process in Victoria. Your goats, your boots, and your foots seem to be pronounced slightly more in the front of the mouth by women. 
Yod Yod is the insertion of a y sound before a vowel. Tune is a great example. Yodders (many speakers of British dialects) pronounce the word like tyune or even chyune. In American English, yod is disappearing and in Canada, it is disappearing slower because it is considered prestigious. Contrary to previous research, this study found yod to be a stable feature in Victoria, but because it is appearing mainly in the word too, it could be another example of back-vowel fronting.
When did the airport ever look like this?

Results

Our three linguists took their results and compared them to the Phonetics of Canadian English thing (PCE) compiled by Boberg (2008). They examined these features across “apparent time” which basically just means they took the age of the speaker into consideration. Their results were pretty close to Boberg’s PCE, but they found trap to be higher and lot-thought-palm higher and backer like Californian English.

OK, so… ?

After world War II, the English school teachers stopped arriving in Victoria and regular ferry service started. Victoria opened up and experienced a quick population growth. This made is a ripe ground for dialect leveling or “phonological simplification.” This could have been when back-vowel fronting and the vowel shift happened.

And…? 

So Victorians, especially young Victorians, mostly speak the same as the majority of western Canadians. Basically anyone under the age of 80 speaks a variety that has leveled out to include the Canadian shift and back-vowel fronting. 

BUT the whole aforementioned yod situation shows that Victoria English is holding onto its history. That and the ban/bag raising are hold-outs that were probably unchanged throughout the 20th century. Whereas the low-back merger could have started in Canadian English around 150 years before it got to Victoria. If that’s true, then Canadian English isn’t a single entity that progressed westward during expansion, but a multi-sourced group of dialects. To me, it says we need more surveys of the varieties of BC English from other areas around the province that aren’t Vancouver.

fin

This article is great for phonetics and acoustical analysis bishes, dialect bishes, Canadian bishes, and of course, Final Destination bishes.


Roeder, R., Onosson, S., & D’Arcy, A. (2018). Joining the Western Region: Sociophonetic Shift in Victoria. Journal of English Linguistics,46(2), 87-112. doi:10.1177/0075424217753987

Roeder, Rebecca & Onosson, Sky & D’Arcy, Alexandra. (2015). Simultaneous innovation and conservation: Unpacking Victoria’s vowels.

Boberg, C. (2008). Standard Canadian English. Standards of English,159-178. doi:10.1017/cbo9781139023832.009
Read More

The public representation of homosexual men in seventeenth-century England – a corpus based view

Baker and McEnery (2017) wanted to find out what the public representation of gay men was in the 1700’s. Of course they weren’t called “gay men” back then and there was a broad range of male-on-male activity that guys could engage in to be considered anything from a sinner to a sorcerer. So this is actually a look at the public representation of guys who did what Jonathan Van Ness would call “gay stuff.”

Unfortunately this study only covers gay men because of how little writing exists about other queer people from that very binary time. The approach was to explore how gay men were written about. And let’s remember that gayness wasn’t just taboo or frowned upon, it was a capital offense and was only legalized in the UK in 1967.

Source

They used the Early English Books Online Corpus version 3 (EEBO v3), which is great, but unfortunately it’s got so much religious stuff (from meeting minutes to plays and journalism) that results are a bit lopsided.

Challenges

As mentioned above, the large number of religious texts skewed the results. For example sodomite is by far the most frequent term, but it’s mostly used in a Bible-y context (you know, the whole Sodom and Gomorrah thing). The word collocates the most consistently with Genesis, filthy and some guy called Lot because the Sodom and Gomorrah story was in a bit of the Bible called Genesis and the city Sodom had the cute nickname Filthy Sodom. And also Lot was there, I guess. In the Bible-y sense, the word  connoted wickedness, sin, and other deeply negative things, but not necessarily gay stuff. So none of that information is particularly relevant to the public perception of gay men in the 1700’s.

Just as an interesting side note, the word sodomite declined in usage over the century while at the same time there was a rise in church doubt and anti-catholic writings. Also, sodomite collocates with harlot and whore, the only apparent link to sex of any kind.

The other thing is that gay-stuff was just really the most marginal. There was a ton of censorship, with trial records being destroyed and there’s no evidence in the EEBO-v3 of any man self-identifying as ‘into dudes’ because they could have been imprisoned, had their wealth seized, or even been put to death. So what remains in writing is heavily prejudiced, negative, religious, based in mythology, and controlled by the homophobic patriarchy.

Finally there’s the problem of the searching part. Like, what were they to even search the corpus for? They couldn’t search any of our modern terms like homosexual, gay or queer, so then what? What they did was familiarize themselves with the corpus and use their own knowledge and words from the Lexicon of Early Modern English (LEME) and the Historical Thesaurus of the Oxford English Dictionary. They also found more words as they went through. Armed with all the terms for homosexuals and male prostitutes who serviced men they could find, they dove in.

What I did

I took all the words McEnery and Baker searched for and all the words they found in EEBO-v3 and presented them in a dictionary format to accompany this post (click here for dictionary). When possible, I’ve included the metadata from the paper like frequency in the EEBO-v3, era, and definition. From my own brain parts I contributed part of speech and pronunciation. The definitions are those that Baker and McEnery arrived at through collocational analysis. Those without definitions weren’t found in the corpus or weren’t used in a way that allowed for analysis.

Example:

² High Frequency is greater than 500 hits in EEBO-v3, Mid Frequency is between 500 and 100, Low Frequency is from 10 to 100, and Infrequent is anything fewer than 10.

Side note: My intention is for this to be fun because some of the words sound ridiculous to our 21st Century ears (he-strumpet comes to mind), but I would like to acknowledge that none of these were kind-hearted terms. They represent oppression and hate written into law. These laws penalized anyone the cis-gendered heter-normative patriarchy found threatening. I went into this study with a love for lexicography, polysemy, and history, but it’s impossible to explore all of these words without experiencing a deep sadness and regret for the centuries of suffering these words represent.

Conclusion

Seems like only people who thought homosexuality was deviant wrote about it and wrote meanly so. There isn’t a single self-referential use of any of these terms in the whole corpus. However, it is definitely interesting that sexual orientation was at least referenced because there are scholars who claim that homosexuality wasn’t conceivable at that time. These words seem to argue against that.

Also cool is that there are so many different terms. Which to me says that there wasn’t just one concept of a man who was into “gay stuff,” but a variety of different ways to get involved. Sodomy could lead to execution, but ganymede and catamite weren’t accompanied by legal sentences. My favorite realization is that effeminacy wasn’t considered an indicator of sexuality. Apparently, it began to be associated with male homosexuality in the next century at which time guys who were afraid of retribution had to stop kissing each other in greetings and holding hands in public. Finally, it’s interesting that foreign languages and ancient Greek and Roman sources played a big role. And many authors described “these people” and “their acts” as being outside of England. So xenophobic.

Baker and McEnery have one final note for corpus linguists: get back to the text and get into concordancing. It’s called close reading and it involves looking beyond your five word context. Try it. I know I will be.

This article is great for lexicography bishes, history bishes, corpus bishes, and queer bishes.

Click here to proceed to the Dictionary of 17th Century Terms for Homosexual Men

————————————————————————————————————

Mcenery, Tony, and Helen Baker. “The Public Representation of Homosexual Men in Seventeenth-Century England – a Corpus Based View.” Journal of Historical Sociolinguistics, vol. 3, no. 2, Jan. 2017, doi:10.1515/jhsl-2017-1003.

Read More

Maybe it’s a grime [t]ing: TH-stopping among urban British youth

I’ve been thinking a lot lately about how identity is something that we perform. I was introduced to this idea through my exploration of the Iggy Azalea’s persona and performance for my first Linguabishes post (here). It was my first glimpse at the tricky area of identity research. Not dissimilar from code-switching, your identity performance at work is probably super different from the one you perform to your bishes. Identity can change from context to context and it depends on your audience.

Identity is complex and luckily it evolves. Imagine if you were currently performing your identity from age 15.

In Rob Drummond’s recent paper, “Maybe it’s a grime [t]ing: TH-stopping among urban British youth” he cites Bucholtz & Hall’s (2010:19–25) five principles of identity. The gist of which is that identities are not fully-formed, they’re not explicitly conceived, and they’re dynamic.

Adolescence is a time of emerging identities. One way teens attempt to craft their identities is by emulating their role models. Maybe you were Spice Girls fan in 1997 and tried out your first British accent, or an emo Avril Lavigne fan in 2002 and decided to go out and get a bunch of eyeliner. These would both be conscious attempts to appear to be in the same group or have a similar identity as your role models, but remember identity performance isn’t always a conscious choice.

When Drummond was working on the UrBEn-ID (Urban British English and Identity) Project in Manchester (the one in the UK, ok bishes?), he noticed something interesting about 4 students who liked a specific kind of music: they performed TH-stopping some of the time.

TH-stopping is pronouncing a voiceless th as a t, like ‘thing’ as ‘ting’. While less uncommon than  its voiced sister, DH-stopping, (pronouncing ‘them’ as ‘dem’), it occurs in many English varieties including West Indian Englishes and Creoles, Jamaican Creole, British Creole, Irish English, and Liverpudlian. It is also associated with AAE, so in it can be found in Hip-Hop and Grime.

Have you heard of Grime? It’s a type of music born out of early 2000’s East London. Think Fix up, Look Sharp. Grime, like Hip-Hop is rooted in urban black culture, but blooming out of East London, it is also cross-racial using a multiethnolect, an ethnically neutral dialect, called Multicultural London English (MLE). More on that (in search of a Multicultural Urban British English (MUBE)).

A lot of previous work has looked at the language-ethnicity link. Does language reflect ethnicity? Or is it a social performance of ethnicity? I guess no one’s really all that sure, but in this specific case, Drummond found that ethnicity was most definitely not a factor.

While most research that looks at identities of adolescents is in mainstream schools like Eckert’s research, the adolescents in this study were four boys outside of the mainstream education system. They attended a specialized learning center that was designed for students who didn’t fit into the mainstream system for a variety of reasons. The study took place over 2 years and had 25 participants, but TH-stopping was in such limited use that only these 4 boys stood out. To find out why they were TH-stopping they look at a whole bunch of different variables including sex, ethnicity, speech context, musical tastes, age, and a bunch more. Which variable stood out may surprise you…

While context was a significant factor (meaning that in a mock job interview TH-stopping didn’t occur), the biggest variable turned out to be music, but not reported taste in music. Specifically, it was whether the subject was observed to be rapping in class. For the 3 out of the 4 boys, rapping is almost a feature of speech since they regularly slip in and out of it during conversation.

The 4 boys used TH-stopping in conversations where they were trying to show ingroup status with the street, urban, tough culture embodied by Grime. One example is a conversation they had about a mutual acquaintance who was about to get out of jail. They were each trying to show that this person was a friend of theirs. They each in turn referred to him as a tief (for ‘thief’). Another example is of a different boy who in the context of discussing his favorite Grime artist does not TH-stop and then self-corrects in order to use it.

Drummond concludes that among the subjects in this study TH-stopping is not a marker of ethnicity, but a part identity performance. It is a “linguistic resource” that helps align them with a general sense of tough or street culture embodied by grime.

 

 

And just to be clear, it’s not like listening to this type of music has caused their dialects to change. It’s that in order to show that they live in the Grime world, they occasionally stop a TH and perform in-groupedness. This is the major take-away. That and the fact that ethnicity as a concept is not a meaningful mechanism for grouping people.

This should be taken into account in future studies that attempt to link identity and language.

——————————————————————————————————————-

Drummond, Rob. “Maybe Its a Grime [t]Ing: Th-Stopping among Urban British Youth.” Language in Society, vol. 47, no. 02, 2018, pp. 171–196., doi:10.1017/s0047404517000999.

Eckert, Penelope. “Linguistic Variation as Social Practice: The Linguistic Construction of Identity in Belten High (Review).” Language, vol. 77, no. 3, 2001, pp. 575–577., doi:10.1353/lan.2001.0193

 

Read More

Are Emojis Predictable?

Emojis are cool, right? Well typing that sure didn’t feel cool, but whatever. The paper “Are Emojis Predictable?” by Francesco Barbieri, Miguel Ballesteros, Horacio Saggion explores the relationships between words and emojis by creating robot-brains that can predict which emojis humans would use in emoji-less tweets.

But, what exactly are emoji (also is the plural, emoji, or emojis?) and how do they interact with our text messaging? Gretchen McCulloch says you can think about them like gestures. So if I threaten you by dragging a finger across my throat IRL, a single emoji of a knife might do the trick in a text. But if they act like gesture in some cases, what are we to make of the unicorn emoji? Or the zombie? It‘s not representative of eating brains right? Right?? Tell me the gesture isn’t eating brains!

So, obviously,  trying to figure out what linguistic roles emoji can play is tough and it doesn’t help that they haven’t been studied all that much from an Natural Language Processing (NLP) perspective. Not to mention the perspective of AI. Will emoji robots take over the world like that post-apocalyptic dystopian hellscape depicted in movies like… the Emoji Movie and…Lego Batman? Studying emojis will not only protect us from the emoji-ocalypse, but also help analyze social media content and public opinion. That’s called sentiment analysis btw, but more on all things I just tried to learn later.

The Study (or Machine Learning Models, oh my 😖)

For this study, the researchers (from my alma mater, Universitat Pompeu Fabra) used the Twitter APIs to determine the 20 most frequently used emojis from 40 million tweets out of the US between Oct 2015 and May 2016. Then they selected only those tweets that had a single emoji from the top 20 list. It was more than 584600 tweets. Then they removed the emoji from the tweet and trained machine learning models to predict which it was. Simple, right?

Now just to be clear, the methods in this study are way above my head. I don’t want anyone confusing me for someone who understands exactly what went on here because I was fully confused through the entire methods section. I tried to summarize what little understanding I think I walked away with, but found there was just way too much content. So here is a companion dictionary of terms for the most computationally thirsty bishes (link).

So actually two experiments were performed. The first was comparing the abilities of different machine learning models to predict which emoji should accompany a tweet. And the second was comparing the performance of the best model to human performance.

The Robot Face-Off (🤖 vs 🤖)

In the first experiment, the researchers removed the emoji from each tweet. Then they used 5 different models (see companion dictionary for more info) to predict what the emoji had been:

  1. A Bag of Words model
  2. Skip-Gram Average model
  3. A bidirectional LSTM model with word representations 
  4. A bidirectional LSTM model with character-based representations 
  5. A skip-gram model trained with and without pre-trained word vectors

They found that the last three (the neural models) performed better than the first two (the baselines). From this they drew the conclusion that emoji collocate with specific words. For example, the word love collocates with ❤. I’d also like to take a moment to point out this study which points out the emojis are mostly used with words and not to replace them. So we’re more likely to text “I love you ❤” than “I ❤ you.”

 

The Best “Robot”

The best performing model was the char-BLSTM with pretrained vectors on the 20-emojis. Apparently frequency has a lot to do with it. It shouldn’t be surprising that the model predicts the most frequent emojis more frequently. So in a case where the word love is used with the 💕, the model would prefer ❤. Also the model confuses emojis that are used in high frequency and varied contexts. 😂 and 😭 are an example of this. They’re both used in contexts with a lot of exclamation points, lols, hahas, and omgs and often with irony.

The case of 🎄 was interesting. There were only 3 in the test set and the model correctly predicted it in the two occasions where the word Christmas was in the tweet. The one case without it didn’t get the correct prediction from the model.

Second experiment: 🙍🏽vs 🤖

The second experiment was to compare human performance to the character-based representation BLSTM. These humans were asked to read a tweet with the emoji removed and then to guess which emoji of five emojis (😂, ❤, 😍, 💯, 🔥and ) fit.

They crowdsourced it. And guess what? The char-BLSTM won! It had a hard time with 😍 and 💯 and humans mainly messed up 💯 and 🔥. For some reasons, humans kept putting in 🔥 where it should have been 😂. Probably the char-BLSTM didn’t do that as much because of its preference for high frequency emojis.

5 Conclusion

  • The BLSTMs outperformed the other models and the humans. Which sounds a lot like a terminator-style emoji-ocalypse to me. This paper not only suggests that an automatic emoji prediction tool can be created, but also that it may predict emojis better than humans can and that there is a link between word sequences and emojis. But because different communities use them differently and because they’re not usually playing the role of words necessarily, it’s excessively difficult to define their semantic roles not to mention their “definitions.” And while there are some lofty attempts (notably Emojipedia and The Emoji Dictionary) to “define” them, the lack of consensus makes this basically impossible for the vast majority of them.

I recommend this article to emoji kweens,  computational bishes 💻, curious bishes 🤔, and doomsday bishes 🧟‍♀️.

Thanks to Rachael Tatman for her post “How do we use Emoji?” for bringing some great research to our attention. If you don’t have the stomach for computational methods, but care about emojis, then definitely check out her post.

 


 

Barbieri, Francesco, et al. “Are Emojis Predictable?” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, doi:10.18653/v1/e17-2017.

Dürscheid, C., & Siever, C. M. (2017). “Beyond the Alphabet–Communication of Emojis” Kurzfassung eines (auf Deutsch) zur Publikation eingereichten Manuskripts.

Tatman, Rachael. “How Do We Use Emoji?” Making Noise & Hearing Things, 22 Mar. 2018, makingnoiseandhearingthings.com/2018/03/17/how-do-we-use-emoji/.

Read More

Companion to “Are Emojis Predictable?”

Welcome to the companion to

Are Emojis Predictable?

by  Francesco Barbieri, Migual Ballesteros, and Horacio Saggion.

This is where I’ve attempted to provide some semblance of explanation for the methods of the study. Look, I tried my best with this, so don’t judge. I ordered it in terms of the difficulty had instead of alphabetically. References at the end for thirsty bishes who just can’t get enough.

Difficulty NLP Model or Term
 Grinning Face on Twitter Sentiment Analysis

A way of determining and categorizing opinions and attitudes in a text using computational methods. Also opinion mining.

 Smiling Face on Twitter Neural Network

A computer network that’s based on how the human brain works.

 Slightly Smiling Face on Twitter Recurrent Neural Network

A type of neural network that at can be trained by algorithms and that stores information to make context-based predictions. Also RNN.

 Slightly Smiling Face on Twitter Bag of Words

A neural network that basically counts up the number of instances of words in a text. It’s good at classifying texts by word frequencies, but because it determines words by the white space surrounding them and  disregards grammar and word order, phrases lose their meaning. Also BoW.

 Neutral Face on Twitter Skip Gram

A neural network model does the opposite of the BoW. Instead of looking at the whole context, the skip gram considers word pairs separately. It’s trying to predict the context from a word, so it weighs closer words more than further ones. So the order of words is actually relevant. Also Word2Vec.

 Neutral Face on Twitter Long Short-term Memory Network

A recurrent neural network that can learn the orders of items in sequences and so can predict them. Also LSTM.

 Expressionless Face on Twitter Bidirectional Long Short-term Memory Network

The same as above, but it’s basically time travel because half the neurons are searching backwards and half are searching forwards even if more items are added later. Also BLSTM.

 Downcast Face With Sweat on Twitter Char-BLSTM

A character-based approach that learns representations for words that look similar, so it can handle alternatives of the same word type. More accurate than the word-based variety.

 Confounded Face on Twitter Word-BLSTM

Some kind of word-based variant of the above? Probably?

 Face Vomiting on Twitter Word Vector

Ya, this one is umm… well, you see, it has magnitude and direction. And like, you have to pre-train it. So… “Fuel your lifestyle with .”

Congratulations if you’ve made it this far! You probably already know more than me. Scream it out. I know I did 🙂

 


 

REFERENCES

Bag of Words (BoW) – Natural Language Processing, ongspxm.github.io/blog/2014/12/bag-of-words-natural-language-processing/.

Britz, Denny. “Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs.” WildML, 8 July 2016, www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/.

Brownlee, Jason. “A Gentle Introduction to Long Short-Term Memory Networks by the Experts.” Machine Learning Mastery, 19 July 2017, machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/.

Brownlee, Jason Brownlee. “A Gentle Introduction to the Bag-of-Words Model.” Machine Learning Mastery, 21 Nov. 2017, machinelearningmastery.com/gentle-introduction-bag-words-model/.

Chablani, Manish. “Word2Vec (Skip-Gram Model): PART 1 – Intuition. – Towards Data Science.” Towards Data Science, Towards Data Science, 14 June 2017, towardsdatascience.com/word2vec-skip-gram-model-part-1-intuition-78614e4d6e0b.

Verwimp, et al. “Character-Word LSTM Language Models.” [1402.1128] Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition, Cornell University Library, 10 Apr. 2017, arxiv.org/abs/1704.02813.

Colah, Christopher. “Understanding LSTM Networks.” Understanding LSTM Networks — Colah’s Blog, colah.github.io/posts/2015-08-Understanding-LSTMs/.

Nielsen. “Neural Networks and Deep Learning.” Neural Networks and Deep Learning, Determination Press, 1 Jan. 1970, neuralnetworksanddeeplearning.com/chap1.html.

“Sentiment Analysis: Concept, Analysis and Applications.” Towards Data Science, Towards Data Science, 7 Jan. 2018, towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17.

gk_. “Text Classification Using Neural Networks – Machine Learnings.” Machine Learnings, Machine Learnings, 26 Jan. 2017, machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6.

Thireou, T., and M. Reczko. “Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, 2007, pp. 441–446., doi:10.1109/tcbb.2007.1015.

“Vector Representations of Words  | TensorFlow.” TensorFlow, www.tensorflow.org/tutorials/word2vec.

“Word2Vec Tutorial – The Skip-Gram Model.” Word2Vec Tutorial – The Skip-Gram Model · Chris McCormick, mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/.

Read More

Gendered representations through speech: The case of the Harry Potter series

In this paper Dr. Eberhardt (2017) looks at the way gender is represented in the Harry Potter series (the books) by comparing the verbs used to report the speech of Harry’s two sidekicks, Hermione and Ron. She found that even though the words used to report their speech are largely the same, subtle patterns revealed negative gender stereotypes.

Background

There is a really popularly held notion that gender determines the way we speak. Luckily, sociolinguists don’t really follow this line of thinking anymore. Instead they ask how gender and identity interact with a bunch of different of aspects. Unluckily, outside of linguistics, this Men-are-from-Mars BS is so popular that it has seeped into to the subconsciouses of even writers like J.K. Rowling who are attempting to create feminist characters.

And guess what, Eberhardt points out that men and women pretty much use language the same way. Also this fun feminist-side-note, did you know about the semantic degradation of female equivalents of word pairs? And not just word pairs, but also their collocations? And maybe not just in English? It’s a thing. For example, spinster is negative, but bachelor isn’t. It’s called perjoration and it’s interesting. Look it up.

So we can say that language is used to reinforce tired stereotypes, but it’s not language alone because language is culture. So if language is continually depicting women as meek, emotional, or say, shrill (right?), it’s because we’ve all bought into it as a language community.

And as far as children’s literature goes (which btw Harry Potter is despite the number of adults who also enjoy the series), gender representation has a huge impact on how children learn gender-specific behavior. When I was a voracious young reader, I learned I should barely hold back tears, cry out desperately, stammer, and say things breathily to be a proper girl.

No shade on Rowling who wrote a beloved series, created compelling characters and story, and made a massive contribution to culture. However, she has been repeatedly criticized for her failed attempt to create a feminist character. Almost twice as many men are mentioned by name in the series than women¹, some² contend the books reinforce the patriarchy, and others³ think JK’s attempt at gender equality is superficial.

The How of it all

The entire set of seven books comprises a corpus of 1.1 million words. Eberhardt looked at all the times Hermione and Ron spoke and checked out what verbs were used to describe their speech. As it turns out, Ron has only a smidge more reporting verbs (2154) than Hermione (1937). However, only Hermione: cried, shrieked, ordered, and screamed and her neutral speech is described differently.

Unique verbs

A closer look at the verbs that were unique to Ron and Hermione revealed that all but one of Hermione’s unique verbs are stereotypes. They’re either high-pitched fear or sadness slash helplessness. Ten points if you can name three from each category (key at the end). Ron’s are either loud (bellow, roar) or emotionally distant (mumble, grumble, grunt). Ron shouts and yells, Hermione gasps and snaps. He mutters, but she whispers. So ya, they’re both super reinforcing of stereotypes.

When a character uses cry for a magical incantation, it suggests that the spell was performed in a loud,emotional, high-pitched voice. Ron only does this once. Hermione uses it 37 times for both spell casting and, more frequently, for emotion. This frequency increases throughout the series. So boys are angry and loud and girls are increasingly upset as they mature and become sexually viable.

Another way to look at the reporting verbs in the novels is to check out which verbs both Hermione and Ron share, but which, like cry, are used different amounts. Eberhardt found that both characters suggest and demand but with inverse frequencies. So he demands more than twice as much as she does and she suggests twice as much as he does. Probably because men are assertive and women are cooperative, right?

Modified speech

Apart from their unique verbs, Eberhardt also noticed a difference in the way their verbs were described. Not only is Hermione’s speech described in more detail than Ron’s, but she also has fewer neutral reporting verbs like, say or ask. Ron’s modifiers show his knowledge, but the modifiers Hermione gets show her feelings and her feelings are often negative. She gets angry, fearful and sometimes says things seriously.

For example, in a heated argument Ron’s speech is reported with said alone, but Hermione’s said is modified by her voice unusually high. Ron’s speech was also reported with loud, violent words like hurled at, and shouted while Hermione’s speech was reported with cry.

In this way we may infer Ron’s emotions from context, but they’re clearly not as important. The same goes for Hermione’s intelligence, she is ostensibly the most knowledgeable of the crew, but that is not as important as her emotions, which get described in detail. Is this in-depth description of Hermione’s emotions a part of the way we feel entitled to scrutinize and judge women’s appearances, voices, and actions? You tell me.

So?

Hermione and Ron are mostly the same, but the areas where they are different are interesting. Are the differences due to their genders or are we to believe that Hermione is just one emotional young woman independent of stereotypes? Maybe if the verbs of the other characters were also examined, we’d find that many more male characters cry and shriek and many female ones mutter, but since the results so closely align with gender stereotypes and the findings of other studies, maybe not.

Also, Eberhardt points out that this binary pattern (Hermione is emotional and Ron rational) supports the theory that women are one thing and men a different thing and presents this belief to young minds. For all the good Rowling’s does in creating a feminist icon, she undoes by instilling this stereotypical ideology in impressionable minds.

Read this article if you’re a sociolinguist bish, a language and gender bish, or a witchy bish.

Key to Hermione’s Unique Verbs

High pitch fear: 

  • scream
  • squeal
  • shriek

Sadness/helplessness: 

  • squeak
  • wail
  • whimper

 

Eberhardt, M. (2017). Gendered representations through speech: The case of the Harry Potter series. Language and Literature,26(3), 227-246. doi:10.1177/0963947017701851

²Dresang, E. (2002). Hermione Granger and the Heritage of Power. In The Ivory Tower and Harry Potter: Perspectives on a Literary Phenomenon(p. 211).

¹Heilman, E., & Donaldson, T. (2009). Representations of Gender the Harry Potter Series. In Critical Perspectives on Harry Potter(p. 139).

³Yeo, M. (2004). Harry Potter and the Chamber of Secrets: Feminist Interpretations/Jungian Dreams. SIMILE: Studies In Media & Information Literacy Education,4(1), 1-10. doi:10.3138/sim.4.1.002

 

Read More

Police interviews with vulnerable people alleging sexual assault: Probing inconsistency and questioning conduct

This paper examines actual police interviews with people with intellectual disabilities reporting sexual assault. Focusing on probing inconsistencies in the victim’s account with pragmatically difficult questions, Antaki, C., Richardson, E., Stokoe, E., & Willott, S. attempt to determine how well officers follow recommended interview guidelines.

It is known that cops have taken insensitive lines of questioning with victims of sexual assault and rape. Stories of victims being asked what they were wearing or if they’d be drinking or other irrelevant questions about the context of their attack are as common as they are infuriating.

Implying fault and questioning the victim’s conduct is not only demoralizing to a person who is already feeling guilt, shame, and fear. Worse, it discredits the victim’s statement for judicial processing. And that’s just for intellectually typical victims who may have the language processing skills to be able to clarify details and defend themselves.

People with intellectual disabilities have even more obstacles to overcome. They are more likely to be victims of abuse and violence, less likely to succeed in prosecuting their assaulters, and suffer greater emotional and psychological distress after the event to boot. This paper doesn’t specify what is meant by “intellectual disability,” except to say that those with intellectual disabilities, learning, or psychiatric problems, can struggle to communicate, function socially, and to read pragmatic linguistic clues (head to the ever current and informative Conscious Style Guide for a brush-up on terms).

Antaki et. al. generously point out that police are in a tough place because they need to be able to present a statement for the victim’s defense in court. This means they need to obtain a clear account of events from a recently traumatized person who may have a hard time discussing their assault or remembering it clearly. If that person also struggles with communication and social functioning, it can be even more difficult to compile a coherent series of events. Toss in a little difficulty reading pragmatic linguistic clues like non-literal expressions and hypothetical and indirect questions (you know, things that exist in typical conversations and interviews) and then think about how well those interviews go.

You might be thinking that given the frequency and severity of sexual assault, cops are probably trained to interview victims. Well, ya… kinda. The police in this study are advised by the Royal College of Psychiatrists to have training for interviewing those with intellectual disabilities and they provide a general guide to help with that. The guide points out that inconsistencies and omissions are usually caused by the interviewer jumping to conclusions. They indicate that cops should never voice suspicion, call the witness a liar, or challenge them directly. The guide is not specific to those with intellectual disabilities, however, and there doesn’t seem to be any mechanism for tracking how well the guide is followed let alone how well it works for those with intellectual disabilities.

The focus of this study is to determine how well actual police interviews adhere to this guide when interviewing people with intellectual disabilities, especially in probing the inconsistencies with pragmatically difficult questions. Evidence was gathered from 19 interviews with people with what the English police force called “learning disabilities” reporting sexual assault or rape. Of the 19 only 3 of them went to court, and only 2 succeeded in getting a guilty verdict.

 

RESULTS

Spoiler alert, there were departures from the guidelines. Mainly in areas the guide explicitly advised against. They were a) implying the story made no sense or was very unlikely or b) implying the witness’ behavior was to blame. These implications involve complex pragmatics that may be difficult for those with intellectual disabilities to process.

Basically, these questions present a logical problem that requires extra processing that people with intellectual disabilities might not be able to handle. Hypothetical phrasings like “If it was raining, why didn’t you bring an umbrella?” cast doubt and indicate failure to do something appropriate, but the interviewee may not pick up on that. Hypothetical questions also require the interviewee to process something that did not happen and is not a part of their memory. On top of that, they need to see that their conduct was unexpected or wrong and detect the implication of blame in order to defend themselves and their credibility. Complicated.

These types of questions challenge the victim’s conduct and truthfulness. This is exactly what the interviewers are asked not to do. The extra stress added by these questions can even impede memory which is why answers to these challenging questions frequently are “I don’t know.” This is a problematic answer since a person is expected to know why they do what they do. Being unable to explain one’s actions is a credibility nightmare.

 

Discussion

As the guide says, asking why causes more problems than it fixes. It promotes the feeling of blame when victims often already blame themselves.

And while it is tough for interviewers because they have to record a first-hand statement as evidence for court and check for inconsistencies and vagueness, in order to serve the victim well, the guidelines need to be taken seriously and adherence to them needs to be monitored.

Without very rigorous training and a high level of language competence, it is unlikely that a police officer, or anyone, would have the skills to identify the pragmatic aspects of their own speech or to consider the pragmatic capacity of those with intellectual disabilities.

Even though this study is based on a small sample size, Antaki et al. recommend avoiding probing especially with the hypothetical “Why didn’t you X?”. That seems reasonably obvious, but beyond that there needs to be a robust system for identifying the needs of a victim. Descriptions made by the police of the victim’s disability were cursory. Labels like “learning disability” or “deaf” aren’t helpful or informed assessments.

Finally, interviewing is a skill and those doing it need to be highly trained to serve the victim and their specific needs. That could mean teaching some basic pragmatics to officers so they can avoid complex logical problems, bringing experienced linguists onto the force, or other better ideas I haven’t thought of. The actual application of applied linguistics to interviews could be the difference between putting a sex-offender behind bars or back on the street.

This article is great for pragmatics and sociolinguistics bishes or bishes interested in discourse analysis. There’s even a fun smidgen of Wh-movement and NPI licensing for my syntax bishes.


Antaki, C., Richardson, E., Stokoe, E., & Willott, S. (2015). Police interviews with vulnerable people alleging sexual assault: Probing inconsistency and questioning conduct. Journal of Sociolinguistics,19(3), 328-350. doi:10.1111/josl.12124

Read More