Author: Caitlin O'Connell

Field Notes from 2018’s Adventures in Applied Linguistics

Happy Birthday to us! We’ve been doing the bish thing for a year, so I guess we have to do that tired old practice of recapping because like Kylie, we had a big year.

TL;DR – following is a list of our plans for 2019 and a recap of what we learned in 2018.

This is a still from Kylie Jenner's 2016 New Year Resolutions video. It shows her head and shoulders with the quote "like, realizing things..."
This is a still from Kylie Jenner’s 2016 New Year Resolutions video. It shows her head and shoulders with the quote “like, realizing things…”

#goals

    1. We’re looking for guest writers. So if you know any other linguabishes, send them our way.
    2. We’re diversifying our content to include not just peer-reviewed articles in academic papers, but also conference papers, master’s theses, and whatever else strikes our fancies.
    3. We’re planning to provide more of our own ideas like in the Immigrant v. Migrant v. Expat series (posts 1, 2, and 3) and to synthesize multiple papers into little truth nuggets.
    4. Hopefully it won’t come up, but we’re not beyond dragging any other racist garbage parading as linguistics again.

Plans aside, here’s all the stuff we learned. We covered a lot of topics in 2018, so it’s broken down by theme.

Raciolinguistics and Language Ideology

We wrote 5 posts on language ideology and raciolinguistics and we gave you a new word: The Native-speakarchy. Like the Patriarchy, the Native-speakarchy must be dismantled. Hence Dismantling the Native-Speakarchy Posts 1, 2, and 3. Since we had a bish move to Ethiopia, we learned a little about linguistic landscape and language contact in two of its regional capitals. Finally, two posts about language ideology in the US touch on linguistic discrimination. One was about the way people feel about Spanish in Arizona and the other was about Spanish-English bilingualism in the American job market. 

This is a gif of J-Lo from the Dinero music video. She’s wearing black lingerie and flipping meat on a barbecue in front of a mansion. She is singing “I just want the green, want the money, want the cash flow. Yo quiero, yo quiero dinero, ay.”

Pop Culture and Emoji

But we also had some fun. Four of our posts were about pop culture. We learned more about cultural appropriation and performance from a paper about Iggy Azalea, and one about grime music. We also learned that J.K. Rowling’s portrayal of Hermione wasn’t as feminist as fans had long hoped. Finally, a paper about reading among drag queens taught that there’s more to drag queen sass than just sick burns.

Emojis aren’t a language, but they are predictable. The number one thing this bish learned about emojis though is that the methodology used to analyze their use is super confusing.

This is a gif of of the confused or thinking face emoji fading in and out of frame.

Lexicography and Corpus

We love a dictionary and we’ve got receipts. Not only did we write a whole 3-post series comparing the usages of Expat v. Immigrant v. Migrant in three different posts (1, 2, and 3), but we also learned what’s up with short-term lexicography, and made a little dictionary words for gay men in 1800’s.

Sundries

These comprise a grab bag of posts that couldn’t be jammed into one of our main categories. These are lone wolf posts that you only bring home to your parents to show them you don’t care what they think. These black sheep of the bish family wear their leather jackets in the summer and their sunglasses at night.

This is a black and white gif of Rihanna looking badass in shades and some kind of black fur stole.

Dank Memes

Finally, we learned that we make the dankest linguistics memes. I leave you with these.

 Thanks for reading and stay tuned for more in 2019!

Read More

Paper Drags: Do Linguistic Structures Affect Human Capital?

In case you’ve been off linguistics Twitter for the last week, you should know that it coniptioned last Wednesday. This is what happened.

A study was dropped (ya, academics drop papers) that claimed that in countries where the dominant language allowed pronouns to be omitted, education suffered.

There were a lot of hot takes with linguists sashaying into Twitter for an opportunity to drag this quote unquote study.

TL;DR: the study ignores current work in the field, doesn’t collaborate, uses sloppy methods, and arrives at biased results.

Here are the problems with it as mined from Twitter:

Research:

This study by Horst Feldmann (2018) is not based on current research in linguistics. The “recent” research in that is referenced in the introduction is a baloney economics study from 2013 by M. Keith Chen. It was dragged in its own time for its interpretation of the now infamous Theory of Linguistic Relativity.

Theory of Linguistic Relativity: This a nearly century old study that claimed that an individual’s thoughts are restrained by the languages they speak. It is also known as Whorfianism.

A heck-ton of studies over the last 100 years have attempted to prove or disprove this theory. These days, linguists generally accept that language does or might have some effect on thought, but that we’re not quite sure how large that affect is or might be. I’m not going to get into it here, but if you want to learn more, get reading!

Feldmann, like Chen before him, ran with what we call the strong version of the hypothesis. He boldly claims that “…language shapes speakers’ mental representation of reality…” which it doesn’t. If Feldmann had studied linguistics, he would have known that.

This leads us to the second major issue:

Author expertise:

@gretchenmcculloch compared this type of study to a linguist writing an economics paper. @sesquiotic pointed out that the study was not even co-authored by a linguist. He tweeted that the study has a “crib-toy use of linguistics” and that its chain of reasoning and supposition is patently problematic.

This is all a part of the invisibility of the linguistics field. @adamCSchembri pointed out that somehow, linguists aren’t considered experts by academicians in other fields.

But since Feldmann went ahead and decided to act the linguist anyway, let’s look at his premise:

The premise:

The premise of the paper is that there are languages that license the dropping of the pronoun before a verb. That’s true. A common example is Spanish whose speakers could say “yo hablo” (I speak), but can use just the “hablo” part if they want. Ok, so that’s an incredibly overly simplified explanation, but that’s for another time.

What Feldmann got wrong was claiming that English does not license the dropping of the pronoun. Actually speakers of English do it all the time. For example “do you speak English?” “Sure do!” or “Guess so.”

Yep, that’s pronoun drop. So the premise is wrong. This brings us to the bad linguistics of it all:

Bad linguistics:

@sesquiotic: the study doesn’t include actual linguistics and makes some pretty big claims about linguistics.

This paper is full of bad linguistics so here’s a list of a few that came up on Twitter:

  1. Misspelling hablo as ablo
  2. Studying 103 languages, but not mentioning which ones
  3. Mentioning spoken language, but not including any in their data
  4. Not defining or citing language variables used in regression tables
  5. Grouping together languages without acknowledging language families
  6. Using English example sentences that no one has ever uttered (I speak)
  7. Claiming that V-S-O languages are the most common, but not backing it up with evidence
  8. Referencing “ancient cultural values” and the “distant past” without defining what those things are or researching language history

@eviljoemcveigh: the linguistics is garbage so regression methods, covariates, and other statistical decisions are uninformed.

What do you get when you take an outdated hypothesis, add a false premise, and stir in some bad linguistics?

The conclusion:

Feldmann concludes that dropping a pronoun has a “negative effect of human capital” and that speakers of those languages have less education. Many people on Twitter were reminded of a similar conclusion by the Church of the Flying Spaghetti Monster in an open letter to the Kansas School Board.

The thing is, if you’re not putting in solid research and defined linguistic variables, the conclusion is moot. Feldmann’s conclusion is punching down at countries with less access to education and claiming that no one’s to blame because language. But there are guilty parties in the disparities in education around the world. A linguistics website isn’t the best place to learn about them, but this paper isn’t just bad linguistics, it’s bad anthropology, bad economics, and bad statistics, bad research design, and bad critical thinking.

This bish’s conclusion? Sashay away, Feldmann!

Special thanks to Joe McVeigh (@Eviljoemcveigh), Lee Murray (@MurrayLeeA), Gretchen McCulloch (@GretchenAMcC), James Harbeck (@sesquiotic), and Nic Subtirelu (@linguisticpulse).

Recommended for no bishes!

————————————————————
Feldmann, Horst. “Do Linguistic Structures Affect Human Capital? The Case of Pronoun Drop.” Kyklos: International Review for Social Sciences, 8 Nov. 2018, doi:10.1111/kykl.12190.

Chen, M. Keith (2013). The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets, American Economic Review. 103(2): 690‐731.

Read More

Emoji Grammar as Beat Gestures

Emoji Grammar as Beat Gestures

If you’re a Lingua Bish, you probably know about celebrity linguists Dr. Gretchen McCulloch😻 and Dr. Lauren Gawne 😻. In their presentation at the 1st International Workshop on Emoji Understanding and Applications in Social Media in June (2018), they presented their research to answer the question once and for all, Are emojis a language 🤔? But actually, Gretchen and Lauren always use emoji as the plural for emojis, (bishes don’t) and their research question was “If languages have grammar and emoji are supposedly a language, then what is their grammar?”

If you try to compare emojis to language, the closest you’ll get is word units. Of all the bits of a language, emojis are most similar to words, but language is so much more than a bunch of words. It has parts of speech and structure (and so many other things). Emojis often affect the tone of text or add a layer of emotion😏, but Lauren and Gretchen think that’s just a small part of it because their effect isn’t always straightforward. To compare emojis to words, they decided to look at the most used word sequences and compare them to the most used emoji sequences. They hypothesized that if emoji sequences are repeated they should be considered “beat” gestures, but what is that even?

Beat Gestures and Emojis

So gestures are a different type of communication🖐. They are not a language and they don’t have grammar. 

a beat gesture and definitely cool

One type of gesture is the “beat” gesture. It is characterized by its absence of meaning and its repetitive nature. You use beat gestures when you talk with your hands👐 and most gestures politicians make during speeches are beat gestures.

not cool and not a beat gesture

However, when a really cool person bobs their open palms up and down in the air above their head, you know it means “raise the roof”, so this is not a beat gesture. It seems like emojis act the same way as beat gestures, often repetitive and often with no inherent meaning unless accompanied by words🤯.

The Emoji Corpus

Gretchen and Lauren used a SwiftKey emoji corpus to check out sequences of two, three, and four emojis. That means that they looked for groups of emojis that often appear together. They looked for the 200 most common sequences and noticed that the top sequences used just one repeated emoji. These were the top 10 sequences in the SwiftKey emoji corpus:

The Word Corpus

Then they used the Corpus of Contemporary American English (COCA) to check out word sequences to compare to the emoji sequences. The COCA contains around 500 million words from things like news outlets and websites👩‍💻. In the 200 most common word sequences, they found almost no repetition. The only time words were repeated, were in the cases of “had had” and “very very very.” However, these didn’t even make the top 200. And yes, that could just be because the COCA is formal and perhaps a corpus of informal language would have yielded different results. For example you might get instances of what linguists call the ‘salad-salad reduplication’ (2004) as in “it’s salad salad🥗, not ham salad or jello salad”. It’s the same as “OMG you like like them 😲??” or “It’s Saturday. Tonight I’m going out out💃,” but this bish is digressing. 

Comparing Words to Emojis

The point is, where words are very rarely repeated in a sequence, it appears that emojis are. You’re probably like, “but I send 2-4 emojis at a time and they don’t repeat.” Ya, you might, but I bet they’re pretty similar like 5 different hearts💝💘💖💗💓, or the hear-no-evil monkeys🙈🙉🙊, or allll the dranks🍾🍹🍸🥃🍷🥂🍺. So ya, sometimes they’re all different, but if so, they’re likely on a theme.

But even though emojis can be more repetitive than speech or writing, most emojis occur next to words and not in sequences. Even where emojis occur without words, it’s mostly just one or two at a time and usually in response to a previous message. Guess who else usually partners with words? You guessed it, beat gestures👊! 

It seems like emojis and beat gestures have a lot in common. Let’s list the ways: 

  1. no grammatical structure
  2. no inherent meaning unless accompanied by words
  3. often repeated
  4. often add emphasis

Maybe emojis and beat gestures should get a room already 👉👌😜.

Conclusion

Basically the idea is just to shift the way we think of emojis. Thinking of them as a new language with grammar won’t get research far. Gretchen and Lauren might be on to something by considering emojis to be a type of gesture. Emojis don’t have their own grammar, but they work with our written grammar. They add emphasis, just like beat gestures do with our spoken grammar. So, it’s unlikely that emojis can ever be a full language. If they ever start exhibiting structural regularities in corpus studies though, and start languagifying, I’m sure Gretchen and Lauren will be there to catch it.

This paper is great for emoji bishes👯‍, anyone who texts📱, corpus bishes, and lingthusiasts👸🏻👸🏿👸🏼👸🏾.

——————————————————————————————————–

In: S. Wijeratne, E. Kiciman, H. Saggion, A. Sheth (eds.): Proceedings of the 1st International Workshop on Emoji Understanding and Applications in Social Media (Emoji2018), Stanford, CA, USA, 25-JUN-2018, published at https://ceur-ws.org

Ghomeshi, Jila, et al. “Contrastive Focus Reduplication in English (The Salad-Salad Paper).” Natural Language & Linguistic Theory, vol. 22, no. 2, 2004, pp. 307–357., doi:10.1023/b:nala.0000015789.98638.f9.

Read More

Prospects and Challenges of Short-Term Historical Lexicography

My favorite publication is American Speech, a quarterly journal published by Duke University Press. Yes, it’s a little Anglo-centric, but it has my favorite recurring feature Among the New Words. I developed a very close relationship to this feature through my master’s thesis when I used it to comb through and analyze 10 years’ worth of “new words”. That’s around 2500 words and it was an arduous, tedious, fantastic dictionary wonderland that was totally the best and the worst.

Among the New Words, hereafter to be referred to as ATNW, has the lofty mission of documenting new words and uses of words in real time. It is a totally non-traditional style of lexicography. It’s been running regularly since 1941 but had different incarnations as early as 1937. In its nearly 80 years, ATNW has gone from reader-submissions to the internet age. Ben Zimmer and Charles E. Carson decided to look at ATNWs history and consider its future in the most exciting paper I’ve read all year: Prospects and Challenges of Short-Term Historical Lexicography (2018).

How it all started

In 1933 an English teacher slash Jewish immigrant (slash, from his awesome name, I can only assume refugee from Mordor), Isidor Colodny, started publishing a monthly magazine called Words: A periodical devoted to the study of the origin, history, and etymology of English words. I guess this is the kind of thing people did before Instagram. A couple years later, Isidor (Lord of the 8th ring probably) enlisted Dwight Bolinger, a Spanish Teacher with a Ph.D. to write a column called “The Living Language”.

Bolinger noted a very important part of word collection in his introduction to the very first column. He pointed out that new words are often

“…transitory, so that they leave no mark upon the dictionary; and even those which are fortunate enough to make their way into that solemn repository are usually not recorded in such a way as to show just how they came into being, what was their original context, what suggestive power they may have had aside from their literal meaning…”

Which was basically the premise of my whole thesis btdubs. Also, “that solemn repository” is totally what I’m calling dictionaries from now on.

So Bolinger’s original method for The Living Language was to have readers submit new words and words they found to be used in new ways. They were also asked to include information about coinages (unrealistic goals much?). Even with modern resources, we can’t usually accomplish that. Zimmer and Carson use Bolinger’s entry for “hootenanny” as an example of the difficulties of dating coinages pre-internet. Bolinger dated its first use as 1935, but internet tells us it was used as early as 1906.

Nevertheless, his column reached a broad audience including co-founder of American Speech, H. L. Mencken. He was invited to join and renamed his column Among the New Words in 1941. A man before his time, he eschewed traditional domestic American life for an international, 3D immersive, freelance experience teaching in Costa Rica and performing his American Speech duties remotely.

Bolinger’s neologism spotting skills were on point. He wrote about -worthy from jump. He noticed that we had gone from seaworthy and trustworthy to all kinds of new worthies like newsworthy, courtworthy, and credit worthy par example. That was 1941. Now we have such beauties as Oscar-worthy, cringe-worthy, and meme-worthy.  

Another thing he got right was that we create new words by pronouncing onomatopoeia aswords. He noted ahem and tisk. And that’s totally carried on. Just think of nom nom.

How it all changed

Bolinger passed the torch in 1944 and ATNW met a series of new editors. For the publication’s 50th anniversary, Adele Algeo and her husband John (who were running ATNW at the time) produced a commemorative edition with an overview of the different processes of documenting new words that had been used. Inspired by this, its editors (also Zimmer and Carson + Solomon) did another  retrospective for its 75th anniversary.

A lot of methods were used over the years. There was a lot of reading, and submissions by readers in the beginning. In 1997 Wayne Glowka chose the “ask the kids” method by roping in his undergrad students for credit. Also, in the 90s there was this amazing new method created. It was called “electronic database searching.” So, I don’t know… Encarta perhaps? And since 2009, or “The Year of the Tweet” as I call it, access to language changed. The inundation of language from all social media platforms has made tracking neologisms less a matter of collection and more matter of curation (Zimmer and Carson 2018).

Another cool update is that the publication went digital in 2010. So now when describing a new word, writers can include links to digital media like TV, speeches, music videos, and memes.

The challenges

More access to IRL language use is awesome, but it’s also mo’ words mo’ problems. Ya gotta have a system for using search engines and determining what’s real and what’s just a google algorithm. So, let’s talk about ratchet, shall we? It was the American Dialect Society’s word of the year in 2012. ATNW’s initial treatment of it included these four senses:

  1. (insult) adj Over the top, to the extreme, beyond socially acceptable -1999
  2. (insult) n Woman who is ratchet (as in sense 1) -1999
  3. (neutra or positive) n Type of dance in Shreveport, La., or subgroup of rap music associated with the dance -2004
  4. (positive) adj Excellent, wildly fun, exceeding expectations-2007

According to ATNW’s initial entry it all basically started with one kickass grandmother in Shreveport Louisiana. That’s right, innovative wordsmith Anthony Mandigo allegedly used a word he’d learned from his granny as the title of his hot new track to usher in a new style of rap, Ratchet Rap. ATNW speculated that the word could have come from “wretched.”

But wait! After the publishing, a reader found an earlier use of the word (that’s called antedating btw). It was used in its first sense in 1992 song “I’m So Bad” by UGK, a delightful ditty about S-ing one’s own D as far as I can tell. UGK was from Texas. To this day, that’s all ATNW knows.

All of this illustrates that you can’t just do a google search and call it a dictionary. If the ATNW editors were listeners of rap from 1992 Texas, they would have been able to write a much more informed entry.  Clearly, people have been using ratchet since before 1992- UGK didn’t make it up. It also is a lesson on diversity and inclusion because, stop me if I’m wrong, but I have an image in my head of what the editors of ATNW and those solemn repositories have traditionally looked like, what kind of music they’ve listened to, and which regional dialects they’ve used and I’m willing to bet “ratchet” wasn’t in their lexicons.

So, when you conduct your search of “electronic databases” and the like, you need to thoroughly investigate the source (time and place) and look for whoever was producing content at that time. Rarely are words coined out of the blue, so even if you can’t find any more instances of the word, then call a friend. Someone you know knows someone who knows someone from that area. Sherlock the heck out of that shit!  

This article is great for historical linguistics bishes, lexicography bishes, and Ben Zimmer stans. 

—————————————————————————————————————————

Zimmer, Benjamin, and Charles E. Carson. “‘Among the New Words’: The Prospects and Challenges of Short-Term Historical Lexicography.” Dictionaries: Journal of the Dictionary Society of North America, vol. 39, no. 1, 2018, pp. 59–74., doi:10.1353/dic.2018.0010.

Read More

Joining the Western Region: Sociophonetic Shift in Victoria

One thing that’s always bothered me is the lack of language documentation in rural Canada. Studies of Canadian English represent urban areas. And look, I get it: rural Canadians are spread out thinly across the true north strong and free. Most people live in the urban centers and documenting Canada’s rural dialects would be kind of a big deal. But that means that any  claims on “BC English” are about speakers in noVancouver and even though the population of the city is really diverse, linguistics studies there typically aren’t.

Google says it takes about 18 hours to drive from Vancouver to Fort Nelson.

Even if these studies were more diverse, we’d be no closer to understanding how people in, like, Fort Nelson speak.

And the thing about that that bothers me is that something like 20% of the population in Canada lives rurally. We don’t know what they sound like or what they’re saying to each other.

All of that’s a rant for another time, but you might understand how excited I was when I can across  Rebecca Roeder, Sky Onosson, and Alexandra D’Arcy’s paper  (2018) “Joining the Western Region: Sociophonetic Shift in Victoria” which looks at the way some British Columbians who are not Vancouverites talk. While Victoria isn’t exactly rural, it is definitely not Vancouver and sometimes that’s enough.

The Study

The purpose of this study was to try to describe the English in Victoria, BC, something this intrepid trio of scholars has been working on for a long time. They used certain linguistic features to conduct a study of language change over time. The participants were 14-98 years old and from diverse backgrounds. 

Speaking of backgrounds, can anyone guess where the first Final Destination was filmed?

Victoria (slash “hi mom!”)

Victoria is sort of a mini Victorian-era England. It’s on what we call The Island, a short ferry ride or flight from Vancouver.  It’s an isolated city of around 370,000 people. It was a Hudson’s Bay trading post in 1843 and became a city in 1862. It became the capital of the province in 1971. Private school teachers were imported from England right up to WWII setting the bar for the prestige dialect. And just picture this, I said it was on an island, right? Ya, well it didn’t get regular ferry service to the mainland until 1960. Even though there are now people who commute regularly to the mainland, I have met people who have never been off the island. And I didn’t know this, but the particularly British-y area of Victoria is referred to as the “tweed curtain.” It’s a small, wealthy community with a marina and tea shops (RIP The Blethering Place, tea shop of yore). One would think this modern history of isolation would have some effect on the dialect, no? Well yes, apparently there’s some kind of accent there though its features vary and the population that exhibits them is an aging minority.

Methods

The inquisitive trio used the Synchronic Corpus of Victoria English (SCVE), part of the Victoria English Archive which is comprised of 162  interviews with primarily British-descended Victorians. The speakers range from 1st to 6th generation Victorian (14-98 years old). Some were even related.

For anyone still guessing, maybe “Garrick’s Head” rings a bell

 The Sounds

The Sounds Wait what?

Victoria English 

The Canadian Shift

Ok so, the vowels in lit, #blessed, and sass (vowels kit, dress, and trap if you’re new to LinguaBishesare produced at the same height in the mouth for many English dialects. “Height” refers to where your tongue is when you make a sound. In the Canadian Shift, the vowels in kit, dress, and trap started to lower sometime before 1950. This resulted in a really noticable change among baby boomers. The shift slowed down for Torontonians, but if you’re a Canadian woman under 40, then you might be as much as a generation ahead in the shift than guys you know. The Shift is more recent in Victoria, perhaps because of its relative isolation. Even though it started later, the youth speak really similarly to other Canadians, which means there was a whole lotta change in a little bit of time. Also because of this late start, older Victorians have higher vowels than their peers across the country.

Raising of ban and bag

Y’all have probably heard, in Canada, we say bag [bg] not bag [bæg](same with dragon, wagon, and rag). Also, young Canadians in BC and the prairies do the same thing with ban. (See LinguaBishes Vowel Chart) In Victoria, ban and bag are at the same place regardless of age or gender. This shows that it’s probably a solid Victorian feature that’s at least eighty-five years old.

Back-Vowel Fronting

A back-vowel is a vowel that you make with in the back of your mouth. Like in “Karen, that’s some hot goss.” Fronting means making the sounds more towards the front of your mouth. It is very common in British Columbia. Back-vowel fronting is a systemic process in Victoria. Your goats, your boots, and your foots seem to be pronounced slightly more in the front of the mouth by women. 
Yod Yod is the insertion of a y sound before a vowel. Tune is a great example. Yodders (many speakers of British dialects) pronounce the word like tyune or even chyune. In American English, yod is disappearing and in Canada, it is disappearing slower because it is considered prestigious. Contrary to previous research, this study found yod to be a stable feature in Victoria, but because it is appearing mainly in the word too, it could be another example of back-vowel fronting.

When did the airport ever look like this?

Results

Our three linguists took their results and compared them to the Phonetics of Canadian English thing (PCE) compiled by Boberg (2008). They examined these features across “apparent time” which basically just means they took the age of the speaker into consideration. Their results were pretty close to Boberg’s PCE, but they found trap to be higher and lot-thought-palm higher and backer like Californian English.

OK, so… ?

After world War II, the English school teachers stopped arriving in Victoria and regular ferry service started. Victoria opened up and experienced a quick population growth. This made is a ripe ground for dialect leveling or “phonological simplification.” This could have been when back-vowel fronting and the vowel shift happened.

And…? 

So Victorians, especially young Victorians, mostly speak the same as the majority of western Canadians. Basically anyone under the age of 80 speaks a variety that has leveled out to include the Canadian shift and back-vowel fronting. 

BUT the whole aforementioned yod situation shows that Victoria English is holding onto its history. That and the ban/bag raising are hold-outs that were probably unchanged throughout the 20th century. Whereas the low-back merger could have started in Canadian English around 150 years before it got to Victoria. If that’s true, then Canadian English isn’t a single entity that progressed westward during expansion, but a multi-sourced group of dialects. To me, it says we need more surveys of the varieties of BC English from other areas around the province that aren’t Vancouver.

fin

This article is great for phonetics and acoustical analysis bishes, dialect bishes, Canadian bishes, and of course, Final Destination bishes.


Roeder, R., Onosson, S., & D’Arcy, A. (2018). Joining the Western Region: Sociophonetic Shift in Victoria. Journal of English Linguistics,46(2), 87-112. doi:10.1177/0075424217753987

Roeder, Rebecca & Onosson, Sky & D’Arcy, Alexandra. (2015). Simultaneous innovation and conservation: Unpacking Victoria’s vowels.

Boberg, C. (2008). Standard Canadian English. Standards of English,159-178. doi:10.1017/cbo9781139023832.009
Read More

The public representation of homosexual men in seventeenth-century England – a corpus based view

Baker and McEnery (2017) wanted to find out what the public representation of gay men was in the 1700’s. Of course they weren’t called “gay men” back then and there was a broad range of male-on-male activity that guys could engage in to be considered anything from a sinner to a sorcerer. So this is actually a look at the public representation of guys who did what Jonathan Van Ness would call “gay stuff.”

Unfortunately this study only covers gay men because of how little writing exists about other queer people from that very binary time. The approach was to explore how gay men were written about. And let’s remember that gayness wasn’t just taboo or frowned upon, it was a capital offense and was only legalized in the UK in 1967.

Source

They used the Early English Books Online Corpus version 3 (EEBO v3), which is great, but unfortunately it’s got so much religious stuff (from meeting minutes to plays and journalism) that results are a bit lopsided.

Challenges

As mentioned above, the large number of religious texts skewed the results. For example sodomite is by far the most frequent term, but it’s mostly used in a Bible-y context (you know, the whole Sodom and Gomorrah thing). The word collocates the most consistently with Genesis, filthy and some guy called Lot because the Sodom and Gomorrah story was in a bit of the Bible called Genesis and the city Sodom had the cute nickname Filthy Sodom. And also Lot was there, I guess. In the Bible-y sense, the word  connoted wickedness, sin, and other deeply negative things, but not necessarily gay stuff. So none of that information is particularly relevant to the public perception of gay men in the 1700’s.

Just as an interesting side note, the word sodomite declined in usage over the century while at the same time there was a rise in church doubt and anti-catholic writings. Also, sodomite collocates with harlot and whore, the only apparent link to sex of any kind.

The other thing is that gay-stuff was just really the most marginal. There was a ton of censorship, with trial records being destroyed and there’s no evidence in the EEBO-v3 of any man self-identifying as ‘into dudes’ because they could have been imprisoned, had their wealth seized, or even been put to death. So what remains in writing is heavily prejudiced, negative, religious, based in mythology, and controlled by the homophobic patriarchy.

Finally there’s the problem of the searching part. Like, what were they to even search the corpus for? They couldn’t search any of our modern terms like homosexual, gay or queer, so then what? What they did was familiarize themselves with the corpus and use their own knowledge and words from the Lexicon of Early Modern English (LEME) and the Historical Thesaurus of the Oxford English Dictionary. They also found more words as they went through. Armed with all the terms for homosexuals and male prostitutes who serviced men they could find, they dove in.

What I did

I took all the words McEnery and Baker searched for and all the words they found in EEBO-v3 and presented them in a dictionary format to accompany this post (click here for dictionary). When possible, I’ve included the metadata from the paper like frequency in the EEBO-v3, era, and definition. From my own brain parts I contributed part of speech and pronunciation. The definitions are those that Baker and McEnery arrived at through collocational analysis. Those without definitions weren’t found in the corpus or weren’t used in a way that allowed for analysis.

Example:

² High Frequency is greater than 500 hits in EEBO-v3, Mid Frequency is between 500 and 100, Low Frequency is from 10 to 100, and Infrequent is anything fewer than 10.

Side note: My intention is for this to be fun because some of the words sound ridiculous to our 21st Century ears (he-strumpet comes to mind), but I would like to acknowledge that none of these were kind-hearted terms. They represent oppression and hate written into law. These laws penalized anyone the cis-gendered heter-normative patriarchy found threatening. I went into this study with a love for lexicography, polysemy, and history, but it’s impossible to explore all of these words without experiencing a deep sadness and regret for the centuries of suffering these words represent.

Conclusion

Seems like only people who thought homosexuality was deviant wrote about it and wrote meanly so. There isn’t a single self-referential use of any of these terms in the whole corpus. However, it is definitely interesting that sexual orientation was at least referenced because there are scholars who claim that homosexuality wasn’t conceivable at that time. These words seem to argue against that.

Also cool is that there are so many different terms. Which to me says that there wasn’t just one concept of a man who was into “gay stuff,” but a variety of different ways to get involved. Sodomy could lead to execution, but ganymede and catamite weren’t accompanied by legal sentences. My favorite realization is that effeminacy wasn’t considered an indicator of sexuality. Apparently, it began to be associated with male homosexuality in the next century at which time guys who were afraid of retribution had to stop kissing each other in greetings and holding hands in public. Finally, it’s interesting that foreign languages and ancient Greek and Roman sources played a big role. And many authors described “these people” and “their acts” as being outside of England. So xenophobic.

Baker and McEnery have one final note for corpus linguists: get back to the text and get into concordancing. It’s called close reading and it involves looking beyond your five word context. Try it. I know I will be.

This article is great for lexicography bishes, history bishes, corpus bishes, and queer bishes.

Click here to proceed to the Dictionary of 17th Century Terms for Homosexual Men

————————————————————————————————————

Mcenery, Tony, and Helen Baker. “The Public Representation of Homosexual Men in Seventeenth-Century England – a Corpus Based View.” Journal of Historical Sociolinguistics, vol. 3, no. 2, Jan. 2017, doi:10.1515/jhsl-2017-1003.

Read More

Maybe it’s a grime [t]ing: TH-stopping among urban British youth

I’ve been thinking a lot lately about how identity is something that we perform. I was introduced to this idea through my exploration of the Iggy Azalea’s persona and performance for my first Linguabishes post (here). It was my first glimpse at the tricky area of identity research. Not dissimilar from code-switching, your identity performance at work is probably super different from the one you perform to your bishes. Identity can change from context to context and it depends on your audience.

Identity is complex and luckily it evolves. Imagine if you were currently performing your identity from age 15.

In Rob Drummond’s recent paper, “Maybe it’s a grime [t]ing: TH-stopping among urban British youth” he cites Bucholtz & Hall’s (2010:19–25) five principles of identity. The gist of which is that identities are not fully-formed, they’re not explicitly conceived, and they’re dynamic.

Adolescence is a time of emerging identities. One way teens attempt to craft their identities is by emulating their role models. Maybe you were Spice Girls fan in 1997 and tried out your first British accent, or an emo Avril Lavigne fan in 2002 and decided to go out and get a bunch of eyeliner. These would both be conscious attempts to appear to be in the same group or have a similar identity as your role models, but remember identity performance isn’t always a conscious choice.

When Drummond was working on the UrBEn-ID (Urban British English and Identity) Project in Manchester (the one in the UK, ok bishes?), he noticed something interesting about 4 students who liked a specific kind of music: they performed TH-stopping some of the time.

TH-stopping is pronouncing a voiceless th as a t, like ‘thing’ as ‘ting’. While less uncommon than  its voiced sister, DH-stopping, (pronouncing ‘them’ as ‘dem’), it occurs in many English varieties including West Indian Englishes and Creoles, Jamaican Creole, British Creole, Irish English, and Liverpudlian. It is also associated with AAE, so in it can be found in Hip-Hop and Grime.

Have you heard of Grime? It’s a type of music born out of early 2000’s East London. Think Fix up, Look Sharp. Grime, like Hip-Hop is rooted in urban black culture, but blooming out of East London, it is also cross-racial using a multiethnolect, an ethnically neutral dialect, called Multicultural London English (MLE). More on that (in search of a Multicultural Urban British English (MUBE)).

A lot of previous work has looked at the language-ethnicity link. Does language reflect ethnicity? Or is it a social performance of ethnicity? I guess no one’s really all that sure, but in this specific case, Drummond found that ethnicity was most definitely not a factor.

While most research that looks at identities of adolescents is in mainstream schools like Eckert’s research, the adolescents in this study were four boys outside of the mainstream education system. They attended a specialized learning center that was designed for students who didn’t fit into the mainstream system for a variety of reasons. The study took place over 2 years and had 25 participants, but TH-stopping was in such limited use that only these 4 boys stood out. To find out why they were TH-stopping they look at a whole bunch of different variables including sex, ethnicity, speech context, musical tastes, age, and a bunch more. Which variable stood out may surprise you…

While context was a significant factor (meaning that in a mock job interview TH-stopping didn’t occur), the biggest variable turned out to be music, but not reported taste in music. Specifically, it was whether the subject was observed to be rapping in class. For the 3 out of the 4 boys, rapping is almost a feature of speech since they regularly slip in and out of it during conversation.

The 4 boys used TH-stopping in conversations where they were trying to show ingroup status with the street, urban, tough culture embodied by Grime. One example is a conversation they had about a mutual acquaintance who was about to get out of jail. They were each trying to show that this person was a friend of theirs. They each in turn referred to him as a tief (for ‘thief’). Another example is of a different boy who in the context of discussing his favorite Grime artist does not TH-stop and then self-corrects in order to use it.

Drummond concludes that among the subjects in this study TH-stopping is not a marker of ethnicity, but a part identity performance. It is a “linguistic resource” that helps align them with a general sense of tough or street culture embodied by grime.

 

 

And just to be clear, it’s not like listening to this type of music has caused their dialects to change. It’s that in order to show that they live in the Grime world, they occasionally stop a TH and perform in-groupedness. This is the major take-away. That and the fact that ethnicity as a concept is not a meaningful mechanism for grouping people.

This should be taken into account in future studies that attempt to link identity and language.

——————————————————————————————————————-

Drummond, Rob. “Maybe Its a Grime [t]Ing: Th-Stopping among Urban British Youth.” Language in Society, vol. 47, no. 02, 2018, pp. 171–196., doi:10.1017/s0047404517000999.

Eckert, Penelope. “Linguistic Variation as Social Practice: The Linguistic Construction of Identity in Belten High (Review).” Language, vol. 77, no. 3, 2001, pp. 575–577., doi:10.1353/lan.2001.0193

 

Read More

Are Emojis Predictable?

Emojis are cool, right? Well typing that sure didn’t feel cool, but whatever. The paper “Are Emojis Predictable?” by Francesco Barbieri, Miguel Ballesteros, Horacio Saggion explores the relationships between words and emojis by creating robot-brains that can predict which emojis humans would use in emoji-less tweets.

But, what exactly are emoji (also is the plural, emoji, or emojis?) and how do they interact with our text messaging? Gretchen McCulloch says you can think about them like gestures. So if I threaten you by dragging a finger across my throat IRL, a single emoji of a knife might do the trick in a text. But if they act like gesture in some cases, what are we to make of the unicorn emoji? Or the zombie? It‘s not representative of eating brains right? Right?? Tell me the gesture isn’t eating brains!

So, obviously,  trying to figure out what linguistic roles emoji can play is tough and it doesn’t help that they haven’t been studied all that much from an Natural Language Processing (NLP) perspective. Not to mention the perspective of AI. Will emoji robots take over the world like that post-apocalyptic dystopian hellscape depicted in movies like… the Emoji Movie and…Lego Batman? Studying emojis will not only protect us from the emoji-ocalypse, but also help analyze social media content and public opinion. That’s called sentiment analysis btw, but more on all things I just tried to learn later.

The Study (or Machine Learning Models, oh my 😖)

For this study, the researchers (from my alma mater, Universitat Pompeu Fabra) used the Twitter APIs to determine the 20 most frequently used emojis from 40 million tweets out of the US between Oct 2015 and May 2016. Then they selected only those tweets that had a single emoji from the top 20 list. It was more than 584600 tweets. Then they removed the emoji from the tweet and trained machine learning models to predict which it was. Simple, right?

Now just to be clear, the methods in this study are way above my head. I don’t want anyone confusing me for someone who understands exactly what went on here because I was fully confused through the entire methods section. I tried to summarize what little understanding I think I walked away with, but found there was just way too much content. So here is a companion dictionary of terms for the most computationally thirsty bishes (link).

So actually two experiments were performed. The first was comparing the abilities of different machine learning models to predict which emoji should accompany a tweet. And the second was comparing the performance of the best model to human performance.

The Robot Face-Off (🤖 vs 🤖)

In the first experiment, the researchers removed the emoji from each tweet. Then they used 5 different models (see companion dictionary for more info) to predict what the emoji had been:

  1. A Bag of Words model
  2. Skip-Gram Average model
  3. A bidirectional LSTM model with word representations 
  4. A bidirectional LSTM model with character-based representations 
  5. A skip-gram model trained with and without pre-trained word vectors

They found that the last three (the neural models) performed better than the first two (the baselines). From this they drew the conclusion that emoji collocate with specific words. For example, the word love collocates with ❤. I’d also like to take a moment to point out this study which points out the emojis are mostly used with words and not to replace them. So we’re more likely to text “I love you ❤” than “I ❤ you.”

 

The Best “Robot”

The best performing model was the char-BLSTM with pretrained vectors on the 20-emojis. Apparently frequency has a lot to do with it. It shouldn’t be surprising that the model predicts the most frequent emojis more frequently. So in a case where the word love is used with the 💕, the model would prefer ❤. Also the model confuses emojis that are used in high frequency and varied contexts. 😂 and 😭 are an example of this. They’re both used in contexts with a lot of exclamation points, lols, hahas, and omgs and often with irony.

The case of 🎄 was interesting. There were only 3 in the test set and the model correctly predicted it in the two occasions where the word Christmas was in the tweet. The one case without it didn’t get the correct prediction from the model.

Second experiment: 🙍🏽vs 🤖

The second experiment was to compare human performance to the character-based representation BLSTM. These humans were asked to read a tweet with the emoji removed and then to guess which emoji of five emojis (😂, ❤, 😍, 💯, 🔥and ) fit.

They crowdsourced it. And guess what? The char-BLSTM won! It had a hard time with 😍 and 💯 and humans mainly messed up 💯 and 🔥. For some reasons, humans kept putting in 🔥 where it should have been 😂. Probably the char-BLSTM didn’t do that as much because of its preference for high frequency emojis.

5 Conclusion

  • The BLSTMs outperformed the other models and the humans. Which sounds a lot like a terminator-style emoji-ocalypse to me. This paper not only suggests that an automatic emoji prediction tool can be created, but also that it may predict emojis better than humans can and that there is a link between word sequences and emojis. But because different communities use them differently and because they’re not usually playing the role of words necessarily, it’s excessively difficult to define their semantic roles not to mention their “definitions.” And while there are some lofty attempts (notably Emojipedia and The Emoji Dictionary) to “define” them, the lack of consensus makes this basically impossible for the vast majority of them.

I recommend this article to emoji kweens,  computational bishes 💻, curious bishes 🤔, and doomsday bishes 🧟‍♀️.

Thanks to Rachael Tatman for her post “How do we use Emoji?” for bringing some great research to our attention. If you don’t have the stomach for computational methods, but care about emojis, then definitely check out her post.

 


 

Barbieri, Francesco, et al. “Are Emojis Predictable?” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, doi:10.18653/v1/e17-2017.

Dürscheid, C., & Siever, C. M. (2017). “Beyond the Alphabet–Communication of Emojis” Kurzfassung eines (auf Deutsch) zur Publikation eingereichten Manuskripts.

Tatman, Rachael. “How Do We Use Emoji?” Making Noise & Hearing Things, 22 Mar. 2018, makingnoiseandhearingthings.com/2018/03/17/how-do-we-use-emoji/.

Read More

Companion to “Are Emojis Predictable?”

Welcome to the companion to

Are Emojis Predictable?

by  Francesco Barbieri, Migual Ballesteros, and Horacio Saggion.

This is where I’ve attempted to provide some semblance of explanation for the methods of the study. Look, I tried my best with this, so don’t judge. I ordered it in terms of the difficulty had instead of alphabetically. References at the end for thirsty bishes who just can’t get enough.

Difficulty NLP Model or Term
 Grinning Face on Twitter Sentiment Analysis

A way of determining and categorizing opinions and attitudes in a text using computational methods. Also opinion mining.

 Smiling Face on Twitter Neural Network

A computer network that’s based on how the human brain works.

 Slightly Smiling Face on Twitter Recurrent Neural Network

A type of neural network that at can be trained by algorithms and that stores information to make context-based predictions. Also RNN.

 Slightly Smiling Face on Twitter Bag of Words

A neural network that basically counts up the number of instances of words in a text. It’s good at classifying texts by word frequencies, but because it determines words by the white space surrounding them and  disregards grammar and word order, phrases lose their meaning. Also BoW.

 Neutral Face on Twitter Skip Gram

A neural network model does the opposite of the BoW. Instead of looking at the whole context, the skip gram considers word pairs separately. It’s trying to predict the context from a word, so it weighs closer words more than further ones. So the order of words is actually relevant. Also Word2Vec.

 Neutral Face on Twitter Long Short-term Memory Network

A recurrent neural network that can learn the orders of items in sequences and so can predict them. Also LSTM.

 Expressionless Face on Twitter Bidirectional Long Short-term Memory Network

The same as above, but it’s basically time travel because half the neurons are searching backwards and half are searching forwards even if more items are added later. Also BLSTM.

 Downcast Face With Sweat on Twitter Char-BLSTM

A character-based approach that learns representations for words that look similar, so it can handle alternatives of the same word type. More accurate than the word-based variety.

 Confounded Face on Twitter Word-BLSTM

Some kind of word-based variant of the above? Probably?

 Face Vomiting on Twitter Word Vector

Ya, this one is umm… well, you see, it has magnitude and direction. And like, you have to pre-train it. So… “Fuel your lifestyle with .”

Congratulations if you’ve made it this far! You probably already know more than me. Scream it out. I know I did 🙂

 


 

REFERENCES

Bag of Words (BoW) – Natural Language Processing, ongspxm.github.io/blog/2014/12/bag-of-words-natural-language-processing/.

Britz, Denny. “Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs.” WildML, 8 July 2016, www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/.

Brownlee, Jason. “A Gentle Introduction to Long Short-Term Memory Networks by the Experts.” Machine Learning Mastery, 19 July 2017, machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/.

Brownlee, Jason Brownlee. “A Gentle Introduction to the Bag-of-Words Model.” Machine Learning Mastery, 21 Nov. 2017, machinelearningmastery.com/gentle-introduction-bag-words-model/.

Chablani, Manish. “Word2Vec (Skip-Gram Model): PART 1 – Intuition. – Towards Data Science.” Towards Data Science, Towards Data Science, 14 June 2017, towardsdatascience.com/word2vec-skip-gram-model-part-1-intuition-78614e4d6e0b.

Verwimp, et al. “Character-Word LSTM Language Models.” [1402.1128] Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition, Cornell University Library, 10 Apr. 2017, arxiv.org/abs/1704.02813.

Colah, Christopher. “Understanding LSTM Networks.” Understanding LSTM Networks — Colah’s Blog, colah.github.io/posts/2015-08-Understanding-LSTMs/.

Nielsen. “Neural Networks and Deep Learning.” Neural Networks and Deep Learning, Determination Press, 1 Jan. 1970, neuralnetworksanddeeplearning.com/chap1.html.

“Sentiment Analysis: Concept, Analysis and Applications.” Towards Data Science, Towards Data Science, 7 Jan. 2018, towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17.

gk_. “Text Classification Using Neural Networks – Machine Learnings.” Machine Learnings, Machine Learnings, 26 Jan. 2017, machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6.

Thireou, T., and M. Reczko. “Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 3, 2007, pp. 441–446., doi:10.1109/tcbb.2007.1015.

“Vector Representations of Words  | TensorFlow.” TensorFlow, www.tensorflow.org/tutorials/word2vec.

“Word2Vec Tutorial – The Skip-Gram Model.” Word2Vec Tutorial – The Skip-Gram Model · Chris McCormick, mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/.

Read More