Conference PaperPDF Available

A Corpus-based Analysis of Non-Standard English Features in the Microblogging Platform Tumblr (pp. 295-303)

Authors:

Abstract

DOWNLOAD > https://easychair.org/publications/open/1gb | | LINGUISTIC DEVIATIONS from the standard use of English in computer-mediated communication have been described by the literature as distinctive Netspeak features. Using corpus linguistics and register theory, I examine those linguistic deviations in a corpus extracted from the microblogging platform Tumblr to describe how language use is shaped by the Internet medium. The sample of grammatical, discourse and style features analysed included the use of personal pronouns, idiomatic expressions, abbreviations, examples, quotations, repetitions, intensifiers, emotional and offensive language, as well as other features such as break and run-on-sentences, typography, punctuation, and multimodal elements. The corpus analysis reveals that these texts are characterised as being mainly short written messages combining typical features of the written mode of communication and features of extemporaneous, spoken discourse. Findings also show a non-standard use of punctuation and typography, the inclusion of multimedia elements that helps to overcome the lack of immediate feedback and the use of non-segmental phonology in the conversation. The results of the analysis suggest that Netspeak (non-standard) linguistic features are determined by context and are thus more likely to appear in digital spheres such as social networks, blogs and chats, among users who engage in one-to-one conversations or users who belong to the same in-group, and in communicative situations in which hobbies and personal experiences are the most frequently discussed topics.
A Corpus-based Analysis of Non-Standard
English Features in the Microblogging Platform
Tumblr
Rosana Villares
Universidad de Zaragoza, Spain.
rosanavillares92@gmail.com
Abstract
Linguistic deviations from the standard use of English in computer-mediated
communication have been described by the literature as distinctive Netspeak features.
Using corpus linguistics and register theory, I examine those linguistic deviations in a
corpus extracted from the microblogging platform Tumblr to describe how language use
is shaped by the Internet medium. The sample of grammatical, discourse and style
features analysed included the use of personal pronouns, idiomatic expressions,
abbreviations, examples, quotations, repetitions, intensifiers, emotional and offensive
language, as well as other features such as break and run-on-sentences, typography,
punctuation, and multimodal elements. The corpus analysis reveals that these texts are
characterised as being mainly short written messages combining typical features of the
written mode of communication and features of extemporaneous, spoken discourse.
Findings also show a non-standard use of punctuation and typography, the inclusion of
multimedia elements that helps to overcome the lack of immediate feedback and the use
of non-segmental phonology in the conversation. The results of the analysis suggest that
Netspeak (non-standard) linguistic features are determined by context and are thus more
likely to appear in digital spheres such as social networks, blogs and chats, among users
who engage in one-to-one conversations or users who belong to the same in-group, and
in communicative situations in which hobbies and personal experiences are the most
frequently discussed topics.
1 Internet Language and Netspeak Register
During the last decades, the fast advancement of the Internet has offered platforms for new genres
and styles of communication that have led to a linguistic freedom never known before outside the
digital medium (ElBekaroui, 2008; Kleinman, 2010; Tagliamonte, 2008). Research regarding the
linguistic features of Computer-mediated communication (CMC) refers to it as written speech because
EPiC Series in Language and Linguistics
Volume 2, 2017, Pages 295–303
Professional and Academic Discourse:
an Interdisciplinary Perspective
C.Vargas-Sierra (ed.), AESLA 2016 (EPiC Series in Language and Linguistics, vol. 2), pp. 295–303
"Netspeak", as coined by David Crystal (2001), contains features of both modes of communication.
Speech is characterised by a heavy use of first and second person pronouns, contractions, its level of
formality is generally low and it can be rude. In writing, interlocutors are physically separated,
feedback is asynchronous, the medium is durable, and participants commonly use a wide range of
lexical choices and complex syntax (Baron, 2010). Furthermore, Netspeak introduces new linguistic
features only visible in the digital medium. Unlike Crystal, who regards them as an expansion of the
expressive richness of language, such creative and innovative ways have been criticised as deviant
spelling even if users actually expect to receive some misspelled messages because of the speed and
simplification factors the system brings with it (ElBekraoui, 2008). In addition to technical aspects,
the introduction of linguistic variations and written strategies to overcome some limitations of the
traditional modes of communication gives the impression of closeness in an isolated medium,
similarly to informal language, which creates a sense of community among its users. This is common
in certain Internet genres whereas in others there is a preservation of standard language features. Thus,
the view of the Netspeak as a register has more weight than the one which sees it as a deviation of the
standard language that must be fought back at any cost.
A register is a variety of language corresponding to a particular context whose three situational
variables 'field', 'tenor', and 'mode' have significant and predictable impacts on language use (Lukin et
al., 2008). The field refers to the topic dealt with (e.g., academic, technical, everyday), the tenor
indicates the relationship between the interlocutors (e.g., power relationship, closeness, age, gender)
and the mode (e.g., spoken, written, CMC), which change language to make it appropriate to each
particular situation. Hence, due to the Netspeak mode’s limitations and innovations, the audience to
whom the message is addressed, the relationship between users and the topic discussed will adjust the
linguistic and stylistic characteristics of the message. Since Netspeak is likely to have far-reaching
consequences for a great majority of young people in the early twenty-first century, the importance of
a new linguistic discipline such as Internet linguistics, “an approach to understanding how [online]
language [works]” (Young, 2013), is vital to provide the future speakers and teacher of English with a
theoretical framework of all these changes in vocabulary, grammar, and spelling in order to use the
'Netspeak register' appropriately.
2 Methodology
As argued by Randall (2002), deviations of Standard English are less noticeable in websites
related to formal genres such as newspapers, criticism blogs or academic websites where the writer’s
message will resemble the standard features of written texts. According to this hypothesis, I collected
a corpus of 527 posts from the microblogging platform Tumblr, regarded as a representative of an
informal website accessed mainly by teenagers and young adults who interact freely, sharing and
discussing information and personal experiences, and engaging in many-to-many and one-to-one
conversations. The main purpose of this quantitative analysis was to explore the frequency with which
linguistic deviations occur, in what kind of interactions appear and to reflect on whether they are as
dangerous for the language as some scholars have argued. The discourse analysis focused on a
detailed exploration of the standard and non-standard use of the grammatical features, discourse, style,
typography and punctuation encountered in the corpus.
3 Results
The texts collected for the corpus, organised from higher to lower frequency, can be classified
according to their format as text (45.92%), image (23.72%), comment (14.04%), question (13.85%),
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
296
audio (1.90%) and video (0.57%). The comment and question formats are also popular and stress the
interactivity of the website since users can add their opinions by editing the original post and
ask/answer other users in one-to-one discussions. Following chat conversation and microblogging
format conventions, messages tend to be short, users directly address in one-to-one or many-to-many
conversations, and they discuss topics regarding generally filmic material (32.25%), personal issues
and opinions (26.18%), and news (9.11%) in a conversational way, which brings to the fore the
emotive function of language (Peterson, 2011). Supporting previous studies on Internet language use
(Crystal, 2001; Randall, 2002), several standard English features (51.70%) can be seen in the use of
grammar, typography and punctuation rules while non-standard English features represented 48.30%
of the total analysed posts. Another popular element of the medium is the use of multimedia features
(88.66%) as an example of how the new features of CMC can supplement texts written in both
standard and non-standard English by adding the tag system, images and gifs, and hyperlinks to
provide a more precise contextualization.
3.1 Discourse Analysis: grammar
In agreement with Baron (2010), the most common grammatical features in the corpus texts were
explicit references to the writer (50.09%) and the reader (29.22%) and the use of intensifiers. These
three features reflect the importance of the users’ interaction in the medium conveying a sense of
inclusive community, and the relevance of the expressive function of language concerning emotions
and interests. Unlike conventional writing, the writer/reader relationship in the corpus texts is a close
one. The use of first person pronous reflects that the main topics in Tumblr are those related to
personal issues. Thus, it is not surprising to find a high presence of the forms I/my/me or we/our. The
lattest is also frequently used as an inclusive pronoun, referring to a group of people interested in the
same hobbies, and usually providing a sense of closeness and common goal e.g., we have literally
created our own dialogue?, can we talk about.... Another form writers use to address their readers is
by means of using expressions such as you, guys, people. You is used for three different functions: to
address the reader, to generalise when a person says an example or a situation they imagine can be
familiar to the reader, or to reinforce the vocative, e.g., all you guys know.
Repetition (9.11%) and intensifiers (29.03%) are also recurrent linguistic features in the corpus
texts. Repetition mainly occurs in single words e.g., oh gosh oh gosh, really really, this… this photo
man to convey emphasis and relevance. In the case of intensifiers the most frequent ones are so, really
and very accompanying evaluative adjectives such as great, awesome, amazing, or situations where
intensifiers can be replaced by slang vocabulary e.g., hella. Other intensifying forms are the structure
how + adj; superlatives with the particle ever like the cuttiest thing I’ve ever seen; intensifying
adverbs such as absolutely, totally, the over-use of intensifiers more super special awesome; and
creative metaphors and exaggerations e.g., 10000% better, the only news report anyone should care
about or I wish it with the force of one thousand suns.
Another relevant grammar feature was the use of the indicative, interrogative and imperative
modes. The imperative mode (11.39%) is frequent in Tumblr posts, as in no listen this is actually,
please write, send your Vikings confesion. At times it is informative, at other times it conveys the
writer’s excitement and subjectivity concerning the importance of the topic being discussed
meaning it should be compulsory. The interrogative mode is repeated several times throughout the
corpus in expressions such as can we or can I simmulating the act of asking permission to talk about a
topic, perhaps conveying the idea of immediacy and interactiveness, i.e. it is so important and obvious
what the writer is going to express that it is unthinkable it has not been discussed yet. As a result, it is
demonstrated that Tumblr may be a bit impositive community although at the same time politeness is
present: please appears several times and compliments to the readers are present in many of the texts
e.g., you guys are awesome.
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
297
Moving to the structure of sentences, the average text is written with punctuation and paragraphs
to give them a structure and cohesion, yet it is possible to see 'break sentences' and 'run-on sentences'
in Tumblr. Break sentences (7.97%) happen because of the influence of chats where almost every
sentence is sent incomplete in order to achive quicker communication. An additional meaning for
them is that they are another strategy for emphasis and for simplifying sentence structures. Run-on
sentences (3.80%) on the other hand, instantiate sentence constructions without any kind of
punctuation. In the context of microblogging, they might be motivated because of the tag system
where punctuation splits a tag or because their purpose is to immitate speech where pauses are made
whenever needed, without the need of any punctuation sign. In the texts analysed they also suggest a
sense of speed in conveying the message. In other words, they reflect ongoing discourse without any
explicit pauses.
Regarding interjections (16.13%), the most repeated words are confirmatory okay, meaning
agreement and as a way to start the message. Onomatopoeias wow, ahh, pssst and conjuctions and or
so are also typical features of spoken discourse (Biber et al., 2002), explicitly marking ongoing
discourse. If writers address other users looking for information, answering previous posts, or they are
aware of their audience, conventional greetings hello/hi, hey or vocatives dude, yo, it’s okay are
present at the beginning of the message.
Exemplification (12.90%) is a strategy the writer uses to provide contextualization and to facilitate
the reader the understanding of the message. They are introduced mainly by like and often combines
with hedges, as in seems like or just like. Some expressions such as as, an example or including
appear when dealing with academic and more formal topics. Another approach writers use to
contextualize and add credibility to what they say is by quoting somebody’s words (11.95%). Direct
speechsometimes preceded by reporting verbsis frequent within the medium when explicitly
quoting somebody’s words by means of inverted commas (11.95%). The most recurrent reporting
verbs are say, be like, tell, think, and go. The predominance of was like in the medium is because
while said implies simply quotation, the former gives a general impression of the person being
quoted, i.e. doing something very similar to the particular quoted material rather than focusing on
phrasing details. In news or narrations where dialogue and action are present, other reporting verbs
like “ask”, reply, exclaim, shout are also used by writers to re-create the context.
3.2 Discourse Analysis: discourse and style
New lexicon on the Internet, leaving aside the vocabulary created because of the technological
nature of the medium e.g., blogger, gif, or timeline, is mainly created by new expressions and words.
The most common word formation methods found in the corpus are abbreviations and acronyms.
Abbreviations (11.95%) in the corpus can be:
1) sound-based u 'you', cos 'because',
2) contractions wanna 'want to', kinda 'kind of',
3) missing termination of the original word congrats 'congratulations', anon 'anonymous' and
4) endings in s or ie pups 'puppies', hommie 'homeless'.
All of them are typical features of spoken language, so their use is not limited to CMC and its
meaning is known for the majority of users. Analysing the acronyms (19.35%) found in the corpus,
they mainly coincide with what the literature describes as regards acronyms in Internet language
(Crystal, 2001, pp. 85-86), and are representative of the different situations where they appear:
1) some acronyms are used in everyday conversation when referring to places USA or
institutions LGTBQ,
2) accepted acronyms such as asap 'as soon as possible', or PS 'post-script',
3) a referential use to mention books, movies, series HP 'Harry Potter', GoT 'Game of Thrones'
and
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
298
4) acronyms of fixed expressions that help to save time when typing like omg 'oh my God', idk
'I don’t know', lol 'laughing out loud', or tbh 'to be honest'.
Less common to find, compounds (2.28%) are another word formation process related to terms
created by the Internet birth. In Tumblr, words like askbox or fangirl can be easily seen as well as a
new element called exclamation!compound e.g., young!Oberyn, which “suggests juxtaposition of two
things; one of them is usually a person or character, to describe an alternate state of the subject"
(allthingslinguistic.com, 2013). Blending (1.71%) is commonly used to combine the names of two
people although it is extended to other contexts too e.g., Brangelina 'Angelina Jolie and Brad Pitt',
vlog 'video blog'. The clipping process (1.52%) is also quite popular in the texts selected from the
Tumblr website and words such as feels 'feelings', faves 'favourite', or defs 'definitively' are very
frequent. Finally, the conversion method (3.61%) often happens with nouns originated by new
technologies, which now are used as verbs e.g., to blog, to email, to photoshop or to tweet.
In addition to word formation and the creation of new words, new ways of expression and
communication (29.41%) are popularised by youngsters in social networks mainly because of the use
of puns, word plays, memes and specific types of writing (6.46%) (Young, 2013). According to the
Merrian-Webster Dictionary, an Internet meme is “an idea, behaviour, or style that spreads from
person to person within a culture.” Examples of these meme expressions are also found in Tumblr in
the form of fixed expressions such as mind blow, true story and friendzone; of particular images
related to a specific idea that allow the modification of text on them; and the presence of certain
writing styles like the 'Doge meme', described as an image accompanied by a deliberately form of
broken English written in Comic Sans MS subunits.
Moreover, the analysis of the corpus shows that emotional language (21.63%) recurred in the
Tumblr posts. Onomatopoeias appear at the beginning of sentences functioning as interjections, or in
the middle/end transmitting the writer’s attitude wow and oh; conveying surprise; aaahh stands for
understanding; awww means cute, adorable; haha is laugh; and yay means excitement. In addition to
common specific constructions such as the use of
1) intensifiers want something so bad,
2) superlatives + ever be the best human be you can be,
3) exaggerations that child is the messiah or #thanks to be one of those who sold their souls so
you could exist and
4) evaluative remarks God is amazing;
There are verbs concerning positive and negative emotions e.g., love, makes me happy, upset,
brought me to tears, references to religion OH MY GOD, sweet Jesus; and the use of can has a
remarkable role in Tumblr communication e.g. can’t even explain, I have lost the ability to can.
Actually, 'I Can’t Even' is “an Internet slang expression used to indicate that the speaker is in a state of
speechlessness, either as a result of feeling overjoyed or exasperated, depending on the context in
which it is said.” (Knowyourmeme.com, 2014). It is also important to mention that offensive language
(14.61%), as is also the case of informal and conversational language, is a recurrent feature in the
corpus analysed. It mostly appeared in topics dealing with pastimes and opinions and the degree of
offensiveness varies, from fuck, shit to freaking, bullshit or what the hell? They function as insults,
modifiers fucking awesome and expressions of astonishment holy crap.
3.3 Discourse Analysis: typography
The most relevant aspects resulting from the corpus analysis regarding typography are the
capitalisation use, italics, bold, and prolongation of vowels. By these means writers overcome some
limitations of the written medium such as tone, rhythm, or any other phonological feature of
communication. As many of the features previously discussed, these aspects can combine among
themselves, be inconsistent and, especially in typography, intersperse between standard and non-
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
299
standard English. For instance, the lack of capitalisation (32.83%) often refers to those cases in which
a text begins without a capital letter. No use of capitalization may mean that the writer does not follow
the norm, even if it is the beginning of the sentence but later will do it after writing the next lines, or
the writer uses capital letters after periods but she/he does not do so when referring to proper names.
Italics (9.30%) and bold (8.16%) typography are generally used in quotes and to highlight specific
parts of a text. Bold is used to differentiate tittles from the body of the post or to emphasise certain
words. Another way of putting emphasis on relevant fragments without using capital letters is when
italics and bold are used at the same time. In the texts analysed, emphasis is also conveyed by adding
letters to a word e.g., helloooo (5.31%) and by using capitalisation on purpose (19.35%) (Peterson,
2011). Within the Tumblr community context, when a post is written entirely with capital letters this
is not considered as annoying as it would be in other genres. Here it indicates the writer is ‘shouting’
(emphasising) to share something they consider of vital importancelater it is up to the reader to
consider whether it was so important or just nonsense. Capitalisation can appear as well in isolated
words, expressions, or sentences the writer wants to emphasise. Depending on each situation, the
purpose behind the capitalisation and prolongation of lettersthe latter is frequently used in one-to-
one situations where one of the participants wants to give an impression of closeness, familiarity,
etc.varies, but it is mainly to attract the reader’s attention to a specific part of the text.
Finally, in the sample of analysed texts it is possible to find two different types of misspellings, the
ones that happen because writers do not proofread the text before sending it (1.70%), as in civiilzation
'civilization', humaity 'humanity'; and when misspellings are done on purpose (10.44%) because it is a
way of showing that a user belongs to a certain groupwith internal jokes and rulesand it imitates
speech in a sort of simplified transcription: some are accepted wanna, yeah, y’all and others are not so
frequent in the written medium imma follow dis, Da fuck they doin ova der, which become a main
feature of meme writing.
3.4 Discourse Analysis: punctuation
Non-standard use of punctuation is also typical in the selected Tumblr texts. In agreement with
Peterson’s (2011) description of punctuation in the Internet register the aim of these features is to
express emotions in a computer-mediated communicative situation, in which it is not possible to
maintain face-to-face interaction. In other words, these features add tone, intensity, ‘facial’
expressions, contextualisation and a classification system. The most recurrent of all are use of
multiple exclamations (12.33%) and question marks, as in First impression: friends?!?!!?
friends!!!!!!!! cute hair!!!!!!!!! or substituting the period (.).
Another key feature of CMC is missing punctuation (35.86%) (Crystal, 2001). Often a text may
initially follow the norm but later punctuation disappears. At other times, the standard use of the
period (.) is substituted by exclamations because, as Crair (2013) notes, nowadays the full stop seems
aggressive rather than neutral. Pauses may be done by breaking sentences or leaving them to the
reader’s decisions in run-on sentences. Similarly, the apostrophe may appear at the beginning of a text
and later be forgotten #made him into the hero he shouldnt be #it’s almost selfish of bucky.
The corpus findings also show that different types of spacing (3.80%) convey different meanings
such as urgency READ THIS. READ IT. LEARN IT. PREACH IT, intensification by separating letters
of words just s o goo d, or the opposite, the linking of words to express excitement and amusement
AHHHKISSHIM!!, #followme, a trend influenced by the tag system where in some websites such as
Twitter, commas and periods split the tag into two. Finally, there is the possibility that non-standard
spacing is done by a typing mistake when the writer does not check their text anotheractreesa role.
The second most recurrent punctuation symbol in the corpus is the hashtag (#) (8.92%) whereas
the at (@) (1.52%) appears mostly when naming a person in Twitter or Instagram “@KattWilliams”.
As argued by Turner (2012), the hashtag has boosted its popularity in microblogs where trendy topics
are created, and they help find topics on the Internet. In Tumblr, the tag system can appear outside or
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
300
within the post and serves two main functions: to classify and organise posts in categories for an
easier finding/avoiding of certain topics #eurovision #uk #graham norton; and for writers to express
their reactions towards a specific post #i don’t even go hear #but fdtd [from disk till dawn] has been
showing up on my dash a lot now and then i get this and #asjdhflgkjdslck […].
The hashtag is very frequent in websites such as Instagram or Twitter where due to this strategy
people try to get more popularity in the form of ‘likes’ and ‘retweets’. Thus, as the Tumblr texts also
illustrate, the use of tags as the main source of web organisationin standard English posts toois a
characteristic of CMC that helps avoid the disappearance of posts with the passing of time.
Another feature found in the corpus is punctuation cross (1.33%) e.g., I don’t like shopping unless
I’m buying myself my favourite food hehe and…. It shows the editing of the text with a comment the
writer has eventually decided to delete although, contradictorily, it is still readable. It adds extra
information that may be irrelevant but the writer wanted to share anyway. Parentheses often add extra-
information to the text e.g., an additional comment (but keep in mind I do have…) or the source of a
post, whereas brackets (3.23%) contextualise e.g., [Little girls runs and grabs his leg] Hello. Asterisks
(3.42%), in a similar way, are used for signalling non-verbal actions and contextualisation. It is
another form used to insert comments that pull away the original textbackstage directions,
contextualization, physical reaction (Yus, 2011) e.g., *cough*, *whispers*. Generally, there is only
one word between the asterisks but nowadays descriptions become more detailed, which add richness
and importance to the action e.g., *throws smoke bomb, disappears while youre distracted*,
*imagines myself working out* okay that's enough exercise for the year.
Emoticons representing faces (7.59 %) or objects (3.23%) are also present in the corpus. As the
literature explains, their purpose is “to express the users’ emotions producing positive judgments
among users but sometimes they can alter the meaning of the message or even invalidating its
propositional content altogether (irony, sarcasm…)” (Yus, 2011, p. 106). In Tumblr, emoticons are
often used literally, matching with the writer’s emotions. The most repeated smileys in Tumblr are
happy faces :), sad faces :(, :O means surprise, and xD accompanies a humoristic comment. In the
case of the non-facial emoticons, the most popular by far is <3 , meaning love. Emoticons different
from faces, usually have its own picturein chat, instant messaging. The image can be added if the
writer knows the web code to insert the emoticon; otherwise, it is easier to use the keyboard symbols.
4 Conclusion
The results of the present analysis are, broadly speaking, consistent with theoretical approaches
that have described the main stylistic, linguistic and technological features of CMC and that have
described Netspeak, a new mode of communication, as an amalgam of the written and spoken modes
(Crystal, 2001; Peterson, 2011; Randall, 2002) with new approaches of expression and manipulation
of language. The corpus has shown that the language used coincides with Crystal’s description of
“written speech” (2001) because it is mainly written although it reflects features of spoken language,
particularly informal speech, which explains the linguistic deviations from standard English in the
digital medium (Baron, 2010). Examples of distinctive features are the use of I and you pronouns,
interjections, intensifiers, emotional language and slang. Furthermore, the corpus analysis showed that
the digital medium provides new features that ease the contextualization and clarification of the
textual messages by means of multimedia elements or tags on the one hand, while on the other, it
makes it possible to edit a text, and receive feedback easily.
Although the corpus has shown that some standard features of English are used in microblogging
messages, to some extent contradicting the critics’ complaints about the destruction of language
(Crystal, 2001), linguistic innovations typical of Netspeak illustrate that, in fact, “the type of language
that is being created online is affecting day-to-day speech patterns and writing styles of most young
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
301
adults” (Baheri, 2013, p. 2). Some of the most recurrent features in the analysis correspond to the
creation of new colloquialisms, acronyms, missing punctuation, non-standard use of capitalization, the
use of new punctuation such as the hashtag, emoticons or asterisks, and the writing of phonological
misspellings to imitate slang. Therefore, a field of research like ‘internet linguistics’ (Young, 2013)
should keep on playing a role in the study of the Netspeak linguistic register and the issues of English
as a Lingua franca in the digital medium.
Broadly, the selected corpus is illustrative of the way the Netspeak register works within a wide
spectrum of formal and informal language that varies depending on the mode, tenor and field (Lukin
et al., 2008) and that informal language often applies to genres of the Internet such as social networks,
blogs, and microblogging platforms (Randall, 2002; Young, 2013; Yus, 2011) whose principal users
are teenagers and young adults. The selected corpus, admittedly small, has shown that the texts in the
microblogging context may be written in standard English, in standard English accompanied with
multimedia files, a blending of standard and non-standard English features, or simply using non-
standard English features. Thus, the place where the communicative exchange takes place does not
seem to determine the use of non-standard English, whose main aim is to foster speed in
communication to shorten the time-lag (ElBekraoui, 2008).
The analysis of the corpus texts suggests that linguistic deviations tend to depend on the tenor and
the relationship between the participants, who are the ones that decide the use of standard and non-
standard English features. The Tumblr texts have illustrated that questions and comments are the main
post types in which non-standard English features appear. In questions, there is a direct one-to-one
relationship between two people, usually sharing common interests and are familiar with their blogs,
who engage into a question-answer pattern that many times includes greetings, emotive language,
acronyms, exclamation marks and other features that resemble “written speech.” On the other hand,
the language used in comments and texts has proved to be more diverse since the audience the
message is addressed is unknown and the topic discussed is serious or informal. In this particular case,
the use of non-standard English may therefore switch from non-standard to standard, by this means
avoiding the familiarity and closeness that non-standard English conveys. Field has also proved to be
an important factor when defining the context of Netspeak communication. Tumblr users represent a
community within which the most relevant topics dealt with are users’ hobbies, personal experiences
and opinions, all of them, also recurrent topics in informal speech. The data analysed has shown that
in Tumblr, groups are created because of shared interests, which explains the recurrence of in-group
references, shared humour and the use of linguistic deviations to overcome the constraints of tone and
body language that the digital medium imposes. As Baheri (2013) puts it, users create their own
‘Tumblr’ argot, which enhances creativity and spreads the change and evolution of language.
It is hoped that the analysis of the present study has illustrated how the digital medium fosters the
creation of a new language used in the Internet, sharing features with the spoken and written modes of
communication and promoting its own stylistic devices. Language evolves and yields new genres,
registers and conventions that require a renovation, or at least debate, of linguistic normative rules.
The study shows that these linguistic novelties can be classified within the Netspeak register since
they tend to be constrained by mode, field, and tenor. It is therefore important that research in the field
of Internet linguistics further analyses language evolution and innovation as regards linguistic and
stylistic features in order to determine whether the impact the Netspeak register may eventually lead,
as some scholars have argued, to the destruction of language, or on the contrary, as an alternative path
for linguistic evolution.
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
302
References
Baheri, T. S. (2013). Your ability to can even: A defence of Internet linguistics. The Toast.
Retrieved from http://the-toast.net/2013/11/20/yes-you-can-even/2/
Baron, N. (2010). Discourse structures in instant messaging: The case of utterance breaks.
Language@Internet, 7.4, 1-26.
Biber, D., Conrad, S. & Leech, G. (2002). The grammar of conversation. Longman student
grammar of spoken and written English (8th ed.). Harlow: Pearson Education Limited.
Crair, B. (2013). The period is pissed: When did our plainest punctuation mark become so
aggressive? New Republic. Retrieved from http://www.newrepublic.com/article/115726/period-our-
simplest-punctuation-mark-has-become-sign-anger
Crystal, D. (2001). Language and the Internet. Cambridge: Cambridge University Press.
ElBekraoui, M. L. (2008). The impact of the Internet on language. Theory of Electronic Design:
Collected Papers. B. Hokanson (Ed.). Retrieved from
https://wiki.umn.edu/pub/DHA5399/SyllabusSpring2010/DesignTheorySp2008.pdf
Exclamation!compounds. (2013). Allthingslinguistic.com. Retrieved from
http://allthingslinguistic.com/post/46453848763/exclamation-compounds
I can’t even. (2014). Knowyourmeme.com. Retrieved from http://knowyourmeme.com/memes/i-
cant-even
Kleinman, Z. (2010). How the Internet is changing language. BBC News. Retrieved from
http://archive.today/d2E3E
Lukin, A., Moore, A., Herke, M., Wegener, R., & Wu, C. (2008). Halliday’s model of register
revisited and explored. Linguistics and the Human Sciences. 4.2, 187-213.
Peterson, E. E. (2011). How conversational are weblogs? Language@Internet, 8.8, 1-18.
Randall, N. (2002). Lingo online: A report on the language of the keyboard generation. Retrieved
from http://www.arts.uwaterloo.ca/~nrandall/LingoOnline-finalreport.pdf
Tagliamonte, S. A. (2008). Linguistic ruin? LOL! Instant messaging and teen language. American
Speech 83.1, 3-34.
Turner, J. (2012). #InPraiseOfTheHashtag. The New York Times. Retrieved from
http://www.nytimes.com/2012/11/04/magazine/in-praise-of-the-hashtag.html?pagewanted=all&_r=0
Young, N. (2013). Internet linguistics Q&A with David Crystal” Spark. Retrieved from
http://sparkcbc.tumblr.com/post/52398439754/internet-linguistics-q-a-with-david-crystal
Yus, F. (2011). Cyberpragmatics: Internet-mediated Communication in Context. Google books.
Amsterdam: John Benjamins.
A Corpus-based Analysis of Non-Standard English Features ... Villares Maldonado
303
... The previously mentioned features-abbreviation, acronym, onomatopoeia, and homophone-have the same characteristic: the spellings change from their original words, but the meanings are retained. However, new lexicons on the Internet such as coined terms from the digital nature of the medium, are mainly created by new expressions and words (Villares, 2017). The final feature of Netspeak deals with word meanings, meaning deviation, and the creation of new words that are influenced by the Internet. ...
Article
Full-text available
The emergence of the Internet gave birth to a new form of language that is unique to the users of the network. Netspeak is the language of the Internet and has adapted the features of both speaking and writing, however, Netspeak has its own unique characteristics as well. This study aimed to find the emerging lexical patterns of Netspeak as used by Filipinos, the extent of use of Netspeak in three most popular social media platforms (Facebook, Instagram and Twitter) as well as various domains of pop culture (entertainment, politics, fashion and sports) and its implications to the language studies in the Philippines. Both qualitative and quantitative methods were used in this study. The corpora of the study were gathered from two months’ worth of social media activities focusing on the comments in the Facebook, Instagram and Twitter of selected public figures. The findings showed that the emerging lexical patterns of Netspeak were abbreviations and homophones and that social media platforms and pop culture domains affect the use of Netspeak features. The platform and domain that got the highest extent of usage of Netspeak lexical features were Twitter and Politics respectively. The results of this study will help in understanding the language that is used in the Internet as well as raise awareness that this kind of language exists.
Article
Full-text available
Halliday’s description of register as ‘a variety of language, corresponding to a variety of situation’, with situation interpreted ‘by means of a conceptual framework using the terms “field”, “tenor” and “mode”’ (Halliday, 1985/89: 29, 38) is revisited to reflect on the theoretical work the term ‘register’ does within the SFL paradigm. In doing so, we recognise that the concepts of a linguistic theory are ‘ineffable’ (Halliday 2002[1988]); i.e. that ‘providing definitions of a theoretical term... requires that it be positioned vis-a-vis other concepts in the theory’ (Hasan, 2004: 16). It follows that changing the position of ‘register’ in the theory changes the nature of the concept. So while alternative uses of the term ‘register’ – such as in Martin’s genre model (e.g. 1992) and Halliday’s model – may advance a shared program for language description and explanation as a route to social change, they must be seen as more than terminological variants. One consequence of the productivity of Martin’s approach has been that the Hallidayan line of register theory has not had sufficient critical explication. This paper therefore begins with a brief review of the register concept. It then exemplifies the term, as postulated by Halliday, with a registerial analysis of surgical interaction, drawing on Hasan’s context modelling (e.g. Hasan 1995, 2004, 2009a), and adopting what Matthiessen (1993) calls a ‘metafunctional slice’ with ‘multistratal coverage’. By accounting for choice at different strata, we seek to ‘relate wording to context via meaning which acts as the interface between the two’ (Hasan 2009a: 182).
Article
Full-text available
This article presents an analysis of Instant Messaging (IM), a one-to-one synchronous medium of computer-mediated communication. Innumerable articles in the popular press suggest that increasing use of IM by teens is leading to a break- down in the English language. The analyses presented here are based on a unique corpus involving 72 teenagers and over a million words of natural, unmonitored IM. In addition, a corpus of speech from the same teenagers is examined for comparison. Targeting well-known IM features and four areas of grammar, we show that IM is firmly rooted in the model of the extant language. It reflects the same structured heteroge- neity (variation) and the same dynamic, ongoing processes of linguistic change that are currently under way in contemporary varieties of English. At the same time, IM is a unique new hybrid register, exhibiting a fusion of the full range of variants from the speech community—formal, informal, and highly vernacular. Teenagers in the early twenty-first century are using home com- puters for communication at unprecedented rates in ever-expanding virtual communities. A particularly favorite medium, at least when we conducted this research, was Instant Messaging (IM). IM is "a one-to-one synchronous form of computer-mediated communication" (Baron 2004, 13). It is "direct, immediate, casual online contact" (Schiano et al. 2002). In essence, IM is real-time "interactive written discourse" (Ferrara, Brunner, and Whittemore
Book
Full-text available
Cyberpragmatics is an analysis of Internet-mediated communication from the perspective of cognitive pragmatics. It addresses a whole range of interactions that can be found on the Net: the web page, chat rooms, instant messaging, social networking sites, 3D virtual worlds, blogs, videoconference, e-mail, Twitter, etc. Of special interest is the role of intentions and the quality of interpretations when these Internet-mediated interactions take place, which is often affected by the textual properties of the medium. The book also analyses the pragmatic implications of transferring offline discourses (e.g. printed paper, advertisements) to the screen-framed space of the Net. And although the main framework is cognitive pragmatics, the book also draws from other theories and models in order to build up a better picture of what really happens when people communicate on the Net. This book will interest analysts doing research on computer-mediated communication, university students and researchers undergoing post-graduate courses or writing a PhD thesis.
Word formation: Abbreviations, acronyms and eponyms
  • H M Kosur
Kosur, H. M. (2012). Word formation: Abbreviations, acronyms and eponyms.
Internet linguistics — Q&A with David Crystal " Spark. Retrieved from http
  • N Young
Young, N. (2013). Internet linguistics — Q&A with David Crystal " Spark. Retrieved from http://sparkcbc.tumblr.com/post/52398439754/internet-linguistics-q-a-with- david-crystal
Retrieved from http://www.brighthubeducation.com/esl- teaching-tips/59719-forming-new-words-abbreviations-acronyms-and-eponyms
  • Brighthubeducation
  • Com
Brighthubeducation.com. Retrieved from http://www.brighthubeducation.com/esl- teaching-tips/59719-forming-new-words-abbreviations-acronyms-and-eponyms/
How the Internet is changing language. BBC News. Retrieved from http
  • Z Kleinman
Kleinman, Z. (2010). How the Internet is changing language. BBC News. Retrieved from http://archive.today/d2E3E
#InPraiseOfTheHashtag. The New York Times Retrieved from http
  • J Turner
Turner, J. (2012). #InPraiseOfTheHashtag. The New York Times. Retrieved from http://www.nytimes.com/2012/11/04/magazine/in-praise-of-the- hashtag.html?pagewanted=all&_r=0
Old words in new combinations: The rise of Internet syntax (Presentation) Retrieved from https
  • G Mcculloch
McCulloch, G. (2014). Old words in new combinations: The rise of Internet syntax (Presentation). Retrieved from https://docs.google.com/presentation/d/1kjYmUIQk5dyo_lyGbIYDpgIzSKDZlw