Content uploaded by Olalekan Akinsande
Author content
All content in this area was uploaded by Olalekan Akinsande on Apr 23, 2020
Content may be subject to copyright.
Published as a conference paper at ICLR 2020
SEMANTIC ENRICHMENT OF NIGERIAN PIDGIN
ENGLISH FOR CONTEXTUAL SENTIMENT
CLASSIFICATION.
Wuraola Fisayo Oyewusi
Data Science Nigeria
Lagos, Nigeria
wuraola@datasciencenigeria.ai
Olubayo Adekanmbi
Data Science Nigeria
Lagos Nigeria
olubayo@datasciencenigeria.ai
Olalekan Akinsande
Data Science Nigeria Lagos,Nigeria
olalekan@datasciencenigeria.ai
ABS TRACT
Nigerian English adaptation, Pidgin, has evolved over the years through multi-
language code switching, code mixing and linguistic adaptation. While Pidgin
preserves many of the words in the normal English language corpus, both in
spelling and pronunciation, the fundamental meaning of these words have changed
significantly. For example, ginger is not a plant but an expression of motivation
and ’tank’ is not a container but an expression of gratitude. The implication is
that the current approach of using direct English sentiment analysis of social me-
dia text from Nigeria is sub-optimal, as it will not be able to capture the semantic
variation and contextual evolution in the contemporary meaning of these words.
In practice, while many words in Nigerian Pidgin adaptation are the same as the
standard English, the full English language based sentiment analysis models are
not designed to capture the full intent of the Nigerian pidgin when used alone or
code-mixed. By augmenting scarce human labelled code-changed text with am-
ple synthetic code-reformatted text and meaning, we achieve significant improve-
ments in sentiment scoring. Our research explores how to understand sentiment
in an intrasentential code mixing and switching context where there has been sig-
nificant word localization.This work presents a 300 VADER lexicon compatible
Nigerian Pidgin sentiment tokens and their scores and a 14,000 gold standard
Nigerian Pidgin tweets and their sentiments labels.
1 BACKGRO UND
Language is evolving with the flattening world order and the pervasiveness of the social media in
fusing culture and bridging relationships at a click. One of the consequences of the conversational
evolution is the intrasentential code switching, a language alternation in a single discourse between
two languages, where the switching occurs within a sentence (Koban, 2013). The increased instances
of these often lead to changes in the lexical and grammatical context of the language, which are
largely motivated by situational and stylistic factors (Inuwa et al., 2014). In addition, the need to
communicate effectively to different social classes have further orchestrated this shift in language
meaning over a long period of time to serve socio-linguistic functions (Ifechelobi, 2015) Nigeria
is estimated to have between three and five million people, who primarily use Pidgin in their day-
to-day interactions. But it is said to be a second language to a much higher number of up to 75
million people in Nigeria alone, about half the population.(Carons & Onyioha, 2012). It has evolved
in meaning compared to Standard English due to intertextuality, the shaping of a text’s meaning by
another text based on the interconnection and influence of the audience’s interpretation of a text. One
of the biggest social catalysts is the emerging urban youth subculture and the new growing semi-
literate lower class in a chaotic medley of a converging megacity (Igboanusi, 2008) (Samanta et al.,
2019) VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based
1
arXiv:2003.12450v1 [cs.CL] 27 Mar 2020
Published as a conference paper at ICLR 2020
sentiment analysis tool that is specifically attuned to sentiments expressed in social media and works
well on texts from other domains. VADER lexicon has about 9000 tokens (built from existing well-
established sentiment word-banks (LIWC, ANEW, and GI) incorporated with a full list of Western-
style emoticons, sentiment-related acronyms and initialisms (e.g., LOL and WTF)commonly used
slang with sentiment value (e.g., nah, meh and giggly) ) with their mean sentiment rating.(Hutto
& Gilbert, 2014). Sentiment analysis in code-mixed text has been established in literature both at
word and sub-word levels (Prabhu et al., 2016) (Roncal, 2019) (Jang & Shin, 2010). The possibility
of improving sentiment detection via label transfer from monolingual to synthetic code-switched
text has been well executed with significant improvements in sentiment labelling accuracy (1.5%,
5.11%, 7.20%) for three different language pairs (Samanta et al., 2019)
2 ME THO D
This study uses the original and updated VADER (Valence Aware Dictionary and Sentiment Rea-
soner) to calculate the compound sentiment scores for about 14,000 Nigerian Pidgin tweets1. The
updated VADER lexicon (updated with 300 Pidgin tokens2and their sentiment scores) performed
better than the original VADER lexicon. The labelled sentiments from the updated VADER were
then compared with sentiment labels by expert Pidgin English speakers.
Figure 1: The semantic enrichment of Nigerian pidgin English for contextual sentiment classification
methodology.
3 RE SULTS
During the translation of VADER English lexicon to suitable one-word Nigerian Pidgin translation,
a total of 300 Nigerian pidgin tokens were successfully translated from the standard VADER English
lexicon. One of the challenges of this translation is that the direct translation of most the sentiment
words in the original VADER English Lexicon translates to phrases not single one-word tokens and
certain pidgin words translates to many english words.2.
1Link to Nigerian Pidgin tweets and Sentiments https://git.io/JvHrp.
2Link to 300 Nigerian Pidgin Sentiments and Scores https://git.io/Jv9og.
2
Published as a conference paper at ICLR 2020
Table 1: Nigerian Pidgin tweets with different sentiment labels
Pidgin Sentence Compound
Sentiment Score
before VADER
English Lexicon
Update
Compound
Sentiment Score
after VADER
English Lexicon
Update
Sentiment
Label before
VADER English
Lexicon Update
Sentiment
Label after
VADER English
Lexicon Update
Sentiment
Label by Expert
Pidgin Speaker
som teams get
black, som get
purple but no one
fine reach our
jersey wey blue
-0.1154 0.7964 negative positive positive
tiri kon-
doooooooooooooo!
Sabi striker
dzeko tear net
wit pellegrini
assist!
0.0000 0.5080 neutral positive positive
gooooooooooooal!!!
leonardo
spinazzola throw
beta cross enta
and davide
biraschi score for
inside hin own
post! 0-2
0.0000 0.6209 neutral positive positive
39 willian try
make beta pass,
na beg we dey.
0.0000 0.5106 neutral positive positive
Na to delete am 0.0000 -0.6908 neutral negative negative
Abed share your
insight with me
0.2960 0.5423 positive positive positive
Why greenwood
dey play nw?ole
you don start
0.3400 -0.2500 positive negative negative
4 CONCLUSION
The quality of sentiment labels generated by our updated VADER lexicon is better compared to the
labels generated by the original VADER English lexicon.1.Sentiment labels by human annotators
was able to capture nuances that the rule based sentiment labelling could not capture.More work can
be done to increase the number of instances in the dataset.
REFERENCES
Tosin Carons, Abraham and M Amaka Onyioha. The origin of pidgin. Afrostyle Magazine, 2, 2012.
URL http://www.afrostylemag.com/ASM7/pidgin.html.
C.J Hutto and Eric Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social
media text. Eighth International Conference on Weblogs and Social Media (ICWSM-14), 2014.
Jane Ifechelobi. Code switching: a variation in language use. Mgbakoigba: Journal of African
Studies, 4:1–7, 2015.
Herbert Igboanusi. Empowering nigerian pidgin: A challenge for status planning? World Englishes,
27:68 – 82, 02 2008. doi: 10.1111/j.1467-971X.2008.00536.x.
3
Published as a conference paper at ICLR 2020
Yusuf Inuwa, Nuhu, Anne Christopher, Althea, and Haryati Bakrin, Bt. Factors motivating code
switching within the social contact of hausa bilinguals. IOSR Journal Of Humanities And Social
Science (IOSR-JHSS), 19:43 – 49, 2014. doi: 10.1016/j.sbspro.2013.01.173.
Hayeon Jang and Hyopil Shin. Language-specific sentiment analysis in morphologically rich lan-
guages. In Proceedings of the 23rd International Conference on Computational Linguistics:
Posters, pp. 498–506. Association for Computational Linguistics, 2010.
Didem Koban. Intra-sentential and inter-sentential code-switching in turkish-english bilinguals in
new york city, u.s. Procedia - Social and Behavioral Sciences, 70:1174–1179, 01 2013. doi:
10.1016/j.sbspro.2013.01.173.
Ameya Prabhu, Aditya Joshi, Manish Shrivastava, and Vasudeva Varma. Towards sub-word level
compositions for sentiment analysis of hindi-english code mixed text. 11 2016.
I˜
naki San Vicente Roncal. Multilingual sentiment analysis in social media. PhD thesis, Universidad
del Pa´
ıs Vasco-Euskal Herriko Unibertsitatea, 2019.
Bidisha Samanta, Niloy Ganguly, and Soumen Chakrabarti. Improved sentiment detection via label
transfer from monolingual to synthetic code-switched text. arXiv preprint arXiv:1906.05725,
2019.
A APPENDIX
Table 2: Average Sentiment Score for Nigerian Pidgin Sentiments with Multiple English Meanings
Pidgin Words VADER Sentiment Token and Score Average Score
kasala riot(-2.6), riots(- 2.), trouble(-1.7) -2.2
gbege catastrophe (3.4), chaos (2.7), chaotic(-2.2), problem (1.7), problems(-1.7) -2.9
para angry(-2.3), annoyed(-1.6), rage(-2.6) -2.2
A.1 SE LEC TI ON O F DATA LABE LLE RS
Three people who are indigenes or lived in the South South part of Nigeria, where Nigerian Pidgin
is a prevalent method of communication were briefed on the fundamentals of word sentiments. Each
labelled Data point was verified by at least one other person after initial labelling.
ACKNOWLEDGMENTS
We acknowledge Kessiena Rita David,Patrick Ehizokhale Oseghale and Peter Chimaobi Onuoha for
using their mastery of Nigerian Pidgin to translate and label the datasets.
4