I have recently been working on a fun corpus linguistic project in collaboration with Victor Kuperman (my supervisor) and Paweł Mandera, which was supported by the Sherman Centre for Digital Humanities at McMaster University.
The aim of the research was to identify differences in the way native and non-native speakers of English pluralise nouns. In some cases, non-native speakers of English pluralise nouns that may be perceived as unusual to native speakers of English (e.g., underwears, violences, informations) because according to grammatical theory of English, these words denote ‘mass’ or ‘amorphous’ concepts. We decided to make use of the GloWbE corpus, which represents 1.9 billion words of English with the purpose of identifying the most frequently pluralised nouns in non-native varieties of English.
Once we established these nouns, we performed a series of statistical analyses which objectively identified 12 semantic categories in which pluralisation occurs the most. Please note that not all nouns we found are ‘mass’ according to native-speaker grammars, but depending on what version of English you speak, you may notice some odd looking plurals.
In order to make more sense of the interactive semantic network, read this recent abstract.
If you have any questions about this work, do not hesitate to email me at email@example.com.
Graph made with Gephi