Mass counts in the World Wide Web

I have recently been working on a fun corpus linguistic project in collaboration with Victor Kuperman (my supervisor) and Paweł Mandera, which was supported by the Sherman Centre for Digital Humanities at McMaster University.

The aim of the research was to identify differences in the way native and non-native speakers of English pluralise nouns. In some cases, non-native speakers of English pluralise nouns that may be perceived as unusual to native speakers of English (e.g., underwears, violences, informations) because according to grammatical theory of English, these words denote ‘mass’ or ‘amorphous’ concepts. We decided to make use of the GloWbE corpus, which represents 1.9 billion words of English with the purpose of identifying the most frequently pluralised nouns in non-native varieties of English.

Once we established these nouns, we performed a series of statistical analyses which objectively identified 12 semantic categories in which pluralisation occurs the most. Please note that not all nouns we found are ‘mass’ according to native-speaker grammars, but depending on what version of English you speak, you may notice some odd looking plurals.

In order to make more sense of the interactive semantic network, read this recent abstract.

BlogPic5link

If you have any questions about this work, do not hesitate to email me at schmiddf@mcmaster.ca.

Graph made with Gephi

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s