Conference PaperPDF Available

Natural-Language Neutrality in Programming Languages: Bridging the Knowledge Divide in Software Engineering

Authors:

Abstract and Figures

This paper introduces an approach to allow English Language Learners (ELLs) to collaborate in the Software Engineering field using their individual native languages. Natural-Language Neutrality (NLN) aims to bridge the Knowledge Divide in Software Engineering by providing tools and methodologies to allow speakers of different Natural Languages to learn, practice and collaborate in an environment that is Natural-Language agnostic. A Knowledge Divide in Software Engineering is constituted by the differences in the knowledge assimilation capability, between native English-speakers and ELLs, due to the English-language barrier. NLN intends to provide standardized methods to enable already-existing and new Programming Languages to be accessible to learners in their Natural-language context. The tools created to achieve this purpose, Glotter, Glotation and the Collaborative Model, are described.
Content may be subject to copyright.
Natural-Language Neutrality in Programming
Languages: Bridging the Knowledge Divide in Software
Engineering
Ivan Ruby1, Salomão David2
1Osmania University, India
ivanrubyds@gmail.com
2Universitá Della Svizzera Italiana (USI), Switzerland
salomao.david@gmail.com
Abstract. This paper introduces an approach to allow English Language Learners
(ELLs) to collaborate in the Software Engineering field using their individual
native languages. Natural-Language Neutrality (NLN) aims to bridge the
Knowledge Divide in Software Engineering by providing tools and methodolo-
gies to allow speakers of different Natural Languages to learn, practice and col-
laborate in an environment that is Natural-Language agnostic.
A Knowledge Divide in Software Engineering is constituted by the differences
in the knowledge assimilation capability, between native English-speakers and
ELLs, due to the English-language barrier.
NLN intends to provide standardized methods to enable already-existing and
new Programming Languages to be accessible to learners in their Natural-
language context. The tools created to achieve this purpose, Glotter, Glotation
and the Collaborative Model, are described.
Keywords: Human Computer Interaction, Computer Science Education, Learn-
ing & Collaboration Technologies, Programming Languages
1 Introduction
A Programming Language (PL) is a formal constructed language used to create a pro-
gram, a list of instructions, to perform a task.
Although a PL specifies a notation (Aaby, 1996) to write programs, these are often
written with a combination of mathematical and everyday language characters, words
and phrases.
According to World Language Statistics (SIL International, 2015), English is the 3rd
most spoken language in the world, with 5.43% of speakers, behind Chinese
1
and Span-
ish with 14.4% and 6.15%, respectively. Nonetheless, a survey of the most used PLs
1
A group of related varieties of languages spoken in China described as dialects of a single
Chinese language
(TIOBE Software BV, 2015) Syntax, Semantics, Standard Library and Runtime System
indicates that the most popular are all English-based.
Although Non-English-based PLs exist (Miller, Vandome, & McBrewster, 2012),
currently the most used have syntax, learning resources, Runtime, and Development
Environments that are developed with an English-speaking audience in mind.
Hypothetically, in a universe of more than 7 Billion people, to make usage of the
speed and computational capacity of machines to solve problems, approximately 94%
of the people would have to be able to express their instructions to the computer in
English, even though not speaking it as a native language.
Software Engineering is a fast changing and evolving field. Thus, it is a challenge to
translate and distribute the learning material in languages other than English, keeping
pace with the technology development. This fact often categorizes a non-native English
speaker student of Software Engineering as an English Language Learner (ELL) since
the learning process makes usage of material and tools that are in English, regardless
of whether the medium of instruction is English or not.
The discrepancy between the English Language not being the most spoken Natural-
Language but being the most widely used in the most popular PLs, inability of ELLs to
use their native languages and the constraint of being taught in one language while
practicing the concepts (programming) in a different language altogether create a
Knowledge barrier, or Knowledge Divide, to ELLs in Software Engineering.
To keep pace with innovations and generate ideas, people need to be able to produce
and manage knowledge. However, the increase in the 21st century of access to infor-
mation has resulted in an uneven overall ability to assimilate it.
Knowledge Divide is a term that denotes the differences between those who have
access to knowledge and can assimilate it, participating in knowledge-sharing and using
it as a tool for development, and others who are impaired in this process (Bindé &
Matsuura, 2005).
A Knowledge Divide in Software Engineering is constituted by the differences in
Software Engineering-related knowledge assimilation capabilities between native Eng-
lish-speakers and ELLs, due to the English-language barrier.
ELLs need to develop English language and literacy skills in the context of the sub-
jects being taught to keep up with English-speaking students (Lee, 2005). However, the
linguistic knowledge that students already possess is often not taken into consideration
(Janzen, 2008).
By allowing students to employ their existing language skills, the Knowledge Divide
can be decreased.
Thus, this paper proposes a methodology to bridge the aforementioned Knowledge
Divide.
2 Data Collection
During the month of April of 2015, a Survey was conducted to 78 students of the Uni-
versity College of Engineering, Osmania University. The students were from different
streams of Engineering but had common introductory Programming courses in C and
C++.
The sample was split into two groups, of 34 and 44 students to have a representative
sample, and the questions presented to the students intended to study the following
factors:
Perceived importance of comments in source code
Perceived importance and difficulty in understanding source code written in a native
language
3 Results
Students found a program without comments easier to understand but when presented
a choice the version with comments was more favorable. When asked about the im-
portance attributed to comments the majority (54%) of the students was neutral. This
inconsistency might suggest that comments are under-used, although of considerable
importance in reading and understanding the source code.
Fig. 1. Perceived Difficulty in understanding a program’s source code
Regarding the difficulty and importance of the usage of Native languages in source
codes, a similar scenario could be verified.
58% of the students found understanding a program written in their mother tongue
difficult. Although a small portion would prefer reading a program written in their na-
tive language, when asked about its importance 45% were neutral.
0%
20%
40%
60%
80%
100%
Very
difficult Difficult Normal Easy Very easy
Without Comments With Comments
Fig. 2. Perceived importance of a Native-language in understanding source code
It is this considerably undecided portion of the sample that led to the following ques-
tions being raised regarding students’ perceptions:
Are students aware of the resources available to them?
Are the resources being presented and contextualized to suit the students’ learning
process?
What determines the outcome of the learning process in Software Development: stu-
dents’ usage of their existing resources or their ability to adapt to the already estab-
lished required resources?
Having this questions in mind, we decided to venture in the construction of a learning
and practice model that would highlight the importance of using students’ existing re-
sources.
4 Multilingual vs. NLN Programming Languages
In one hand, Multilingual PLs, also called International PLs, allow the usage of more
than one Natural Language for writing programs. Such are the cases of ALGOL 68
(Wijngaarden, Mailloux, Peck, & Koster, 1969) and BabylScript (Iu, 2011).
ALGOL 68 is the 1968 version of the Algorithmic Language. It is an imperative PL,
which succeeds ALGOL 60, and provides translations of its Standard in Russian, Ger-
man, French, Bulgarian, Japanese and Chinese. The translations allow the internation-
alization of the PL.
BabylScript is an open-source, multilingual scripting language that compiles to Ja-
vaScript. It is implemented using the Java PL, by modifying the open-source Mozilla
Rhino JavaScript engine. BabylScript has different language modes in which keywords,
objects, and functions names are translated into non-English languages. With this fea-
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Not important Fairly important Neutral Important Very important
ture, it allows programmers to write programs in languages other than English. Bab-
ylScript also allows a mixed language model, on which the same source code can con-
tain code written in more than one language.
At the time of writing, BabylScript has 17 language translations including Chinese,
Hindi, Swahili, Spanish and Russian.
Although Multilingual PLs reduce the initial language barrier, they pose a threat to
their development and adoption for being Natural-language isolated. A larger audience
can be engaged, but ultimately only speakers of the same language can collaborate.
So far, the approaches used for the creation of Multilingual PLs have not been stand-
ardized and a single approach to enable the feature to different PLs, existing as well as
newly created, has not been identified.
On the other hand, NLN is an approach that intends to provide tools and methodol-
ogies to allow speakers of different Natural-Languages to learn, practice and collabo-
rate in an environment that is Natural-Language-agnostic.
By allowing learners with different Native languages to interact in a unified plat-
form, the Single Natural Language (English) knowledge requirement can be reduced.
The English-language is still required in Software Engineering. However, re-estab-
lishing a balance between its usage as a Lingua franca and native languages is desirable,
recognizing the existing linguistic diversity (Bindé & Matsuura, 2005).
NLN can be integrated into an already existing or newly created PL, taking ad-
vantage of the most used English-based ones.
Fig. 3. Natural-Language Neutrality Model
At the core of the NLN approach is a Natural-Language Translation mechanism. The
required translation is only of the PL’s keywords, not of the complete source code.
Hence, we came up with a Translation mechanism that can be further exploited.
5 Tools
The proposed tools contemplate the source code keywords, comments and Collabora-
tion between programmers.
Although each tool is designed to iterate over the elements of Bloom’s Taxonomy
Cognitive Dimension (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956), they mainly
intend to stimulate the Affective Dimension elements in students, through the inclusion
of their existing linguistic knowledge in the problem-solving process.
5.1 Glotter: A compiler-level Natural-Language Neutrality Enabler
A Glotter is a Lexical Analysis tool that converts the source code Lexical Units (tokens)
from a Source to a Target Natural-Language.
A Source Natural Language can be any existing Natural Language while the Target
Natural-Language is a predefined Bridge Natural Language, a Lingua Franca, which
will enable all other Source Natural Languages to be translated to and from it.
The name is derived from the Latin word glot, which means Language, and the Eng-
lish word enabler. Therefore, a Glotter is a Language Enabler.
The Glotter receives a list of tokens, a list of Language Dictionaries and a selected
Natural-Language, which serves as the context for the translation.
Its integration to a compiler enables the possibility of different (translated) versions
of the same keywords being compiled into a single version. This process ultimately
serves the purpose of enabling a single PL to be used with various Natural Languages,
while maintaining all the syntactic and semantic structure and rules.
Upon processing, if the keyword is present at the selected Language Dictionary its
value is substituted by the matching value. Otherwise, it is left intact. Although it is
possible to implement an error reporting functionality upon detection of a non-existent
keyword version in the selected Language, this feature might exceed the responsibilities
of a Lexical Analyzer. Furthermore, this error can be reported by the Syntax Analyzer.
Implementation methods.
Embedded. By integrating the Glotter to the compiler, each token can be translated at a
time. This approach is more flexible and does not add a performance impact on the
normal working of the compiler. An Embedded Glotter requires modification to the
compiler source code for an existing PL, what poses a disadvantage in case a seamless
integration is expected.
Fig. 4. Embedded Glotter Implementation
Standalone. In this alternative method, the Glotter is separated from the Compiler. The
complete source code (input) is parsed by the Glotter, in a process that involves a Lex-
ical Analysis (Tokenization) of the given code. Therefore, the source code is Tokenized
twice. This process requires no modification of an existing compiler's source code, a
fact that constitutes an advantage to enabling NLN in already existing PLs.
Fig. 5. Standalone Glotter Implementation
Algorithm
Algorithm Glotter
Input: A List of Lexical Units L, A List of Language
Dictionaries LD, a selected Language S
Output: A List of Lexical Units L
if S = null, then
S:=“default”
for each token in L, do
translatedToken := null
if token(type) = “Keyword”, then
translatedToken:= isPresentInLanguageDictionary(LD,
S, token(value))
if translatedToken != null, then
token := translatedToken
return L
Algorithm isPresentInLanguageDictionary
Input: A List of Language Dictionaries LD, A selected
Language S, a Lexical Unit token
Output: A String representing the a token or null
for each selectedDictionaryToken in LD(S), do
if selectedDictionaryToken(token) != null, then
return selectedDictionaryToken(token)
return null
Note:
It is assumed that the List of Lexical Units comprises of a list of objects with at least
type and value properties and upon not finding an entry or entry-value in a dictionary
null is returned.
5.2 Glotation: Natural-Language annotated comments
A Glotation is a special kind of comment that includes a source Natural Language at-
tribute and the comment message.
The name is derived from the Latin word Glot, which means Language, and the
English word Annotation, metadata attached to text (in this case, attached to source
code).
The source language attribute can later be used to translate the comment message to
a different Natural Language.
Syntax. @xx message
Where:
@ is a Symbol that denotes a Glotation, xx is a two-letter lowercase ISO
639-1 Language code
2
and message is the Comment message or text.
Example:
@en This is a Glotation
@pt Esta é uma Glotação
@fr Ceci est un Glotation
The example above creates Glotations in English, Portuguese and French with the
equivalents of This is a Glotation. Each time a user will access the source code, an
option to translate the Glotations, Glotate, can allow the translations to occur, provided
the user specifies to Environment (target) Language. Therefore, although the comments
can be written in different languages, a user can choose to visualize all comments in
his/hers context-Natural-Language.
The @ symbol is desirable since its usage is not common among the most used PLs.
Therefore, it is possible to avoid confusion between a general comment and a Glotation.
A Glotation translation can be achieved using a Third Party translation service,
which might require an internet connection.
To implement Glotations, the rules of the Syntax Analyzer (Parser) should be mod-
ified. The rules should detect a Glotation by the symbol @ and build an Abstract Syn-
tax
3
node with the following properties:
type: “Glotation”
language: two-letter country code (content immediately following the @ symbol)
value: the message text (separated from the country code by a whitespace)
Therefore, the rules for a well-formed Glotation can be deduced as:
1. Starts with the @ symbol
2. Has no space between the symbol and the following text
3. The text immediately following the @ symbol consists of a two-character string
4. Immediately following the two character string, there is a whitespace
5. After the whitespace follows the comment message with alphanumeric and special
characters, including whitespace
A message should only be translated if the Glotation language is different from the
Language currently being used in the Development Environment by the user. There-
fore, there should be a mechanism to obtain the Development Environment language.
2
http://www.infoterm.info/standardization/ISO_639.php
3
Tree representation of the Abstract syntactic structure of source code written in a pro-
gramming language
Algorithm
Algorithm Glotation
Input: List of Comment nodes LC, Environment Language L
Output: List of Comments LC
for each comment in LC, do
if comment(type) = “Glotation”, then
if comment(lang) != L, then
comment(value) := Translate(comment(value),
comment(lang), L)
return LC
5.3 Natural-Language Neutrality Collaborative Model for Programming
Languages
Making usage of the Glotter and Glotations, a collaborative model can be implemented
to allow dissimilar Natural-Languages to be used in a programming environment. Such
model should employ a mechanism to allow a user to write a program with keywords
and comments in his/hers Natural-Language granted that this same program can be un-
derstood by a user with a different Natural-Language.
Translation of keywords and comments can be achieved by the Glotter and Glota-
tions, respectively, but the key factor lies in the data format being used when storing
and exchanging the program among the users.
Fig. 6. NLN Collaborative Environment Workflow
Upon creation, the source code to be exchanged should desirably possess only Glota-
tions, instead of only comments or a mix. Such process can be automated on the Source
code editor by automatically replacing general comments with Glotations, granted that
the user has already permitted the functionality and chosen the environment Natural-
Language. Similarly, the source code should always be stored with the keywords in the
target Natural-language.
Such source code file, with Glotations and keywords in the target Natural-Language,
will serve as the intermediate file format, the essence of the collaborative model.
When a different user receives this same source code, the process of contextualiza-
tion can be performed by applying the Glotter and Glotation functionalities, joint or
separately.
Therefore, the inverse process can take place by the second author editing the source
code file, storing it in the intermediate file format and sending it back to the first author.
6 Conclusion
Language plays a critical role in a student's effective education. This process also de-
pends on the teaching institutions taking into consideration the sociocultural aspects of
the learners, such as their identity and experiences (Janzen, 2008; Lee, 2005).
Making the current trends and developments in the Software Engineering field avail-
able should be accompanied by processes, tools, and resources that will enable or, at
least, ease the ability to assimilate this knowledge to the underprivileged. This increase
in literacy would benefit not only the disadvantaged but the society as a whole since
more people would be brought to an acceptable level of literacy and employability,
becoming active contributors in combating poverty.
Although Multilingual PLs exist, a standardized and methodological approach is re-
quired to explore the context of Bridging the Knowledge Divide in Software Engineer-
ing thoroughly. Such can be accomplished through the proposed NLN approach.
Further research should be undertaken to understand its underlying factors, provide
quantitative as well as qualitative indicators of its effectiveness and to incorporate new
tools and methodologies to support it.
References
1. Aaby, A. (1996). Introduction to programming languages. Walla Walla College, Computer
Science Department. Retrieved from
http://www.worldcolleges.info/sites/default/files/aaby.pdf
2. Bindé, J., & Matsuura, K. (2005). Towards Knowledge Societies. UNESCO world report
(Vol. [1]).
3. Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. . (1956). Taxonomy
of Educational Objectives: Handbook 1 Cognitive Domain. London: Longmans, Green and
Co Ltd.
4. Iu, M.-Y. C. (2011). Babylscript : Multilingual JavaScript. In OOPSLA ’11: Proceedings of
the ACM international conference companion on Object oriented programming systems
languages and applications companion (pp. 197198).
http://doi.org/10.1145/2048147.2048204
5. Janzen, J. (2008). Teaching English Language Learners in the Content Areas. Review of
Educational Research, 78(4), 10101038. http://doi.org/10.3102/0034654308325580
6. Lee, O. (2005). Science Education With English Language Learners: Synthesis and
Research Agenda. Review of Educational Research, 75(4), 491530.
http://doi.org/10.3102/00346543075004491
7. Miller, F. P., Vandome, A. F., & McBrewster, J. (2012). Non-English Based Programming
Languages. Alphascript Publishing.
8. SIL International. (2015). Summary by language size |Ethnologue Languages of the World.
Retrieved May 13, 2015, from https://www.ethnologue.com/statistics/size
9. TIOBE Software BV. (2015). TIOBE Index for June 2015. Retrieved April 23, 2015, from
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
10. Wijngaarden, A. van, Mailloux, B. J., Peck, J. E. L., & Koster, C. H. A. (1969). Report on
the Algorithmic Language ALGOL 68. Mathematisch Centrum.
11. Wikipedia. (2015). Non-English-based programming languages. Retrieved May 8, 2015,
from http://en.wikipedia.org/wiki/Non-English-based_programming_languages
... e government should strengthen ICT construction and strengthen the popularization of information and communication technology in schools. Ruby and David [15] provide a way to make English learning learn and practice in an unknowable environment, using NLN to provide standards. Li [16] explores the hybrid model of college English based on modern educational technology and computer technology. ...
Article
Full-text available
With the rapid development of modern information technology, students’ education and computer technology begin to blend, and the modern teaching mode is quite different from the traditional education mode known in the past. In view of the current college English teaching in the information age, this study puts forward the way of integrating computer information technology with college English teaching, improves MLP algorithm, puts forward a new artificial intelligence algorithm, improves its calculation efficiency, and uses the optimized GA-MLP-NN (Genetic Neural Network Algorithm for the Multilayer Perceptron) algorithm in college students’ oral correction program. Firstly, GA-MLP-NN algorithm is used to optimize college English teaching so that more complex structures can be learned and dealt with. Incremental hidden layer unit neural network is added, which makes the operation more accurate based on S-type recursive function. Then, the oral English system is established, using the GA-MLP-NN neural network model. Finally, we evaluate the parameters of the model, design a comparative experiment and a questionnaire survey to verify the rationality and feasibility of the guess, which proves that this method can deal with more complex programs, and make students learn English more handy and close to students’ needs by using computer technology.
... These written components complement the logical and mathematical symbols and notations that are used to create instructions. Thus, native and non-native English-speaking computer programming learners are subject to a difference in assimilation capabilities (Goldenberg, 2008;Harper & Jong, 2004;Janzen, 2009;Lee, 2005) created by the discrepancy between English not being the most natively-spoken natural language while being the most widely used in programming languages (Ruby & David, 2016). ...
Conference Paper
Full-text available
Introductory computer programming education is a worldwide, contemporary phenomenon, propelled by a demand for skilled individuals in the technology industry as well as the ever-increasing availability and decrease in costs of computing devices. Statistics show that English is the most spoken language in the world and widely used in the computer programming field, even though spoken natively only by approximately 5% of the world population. In this study, we examine the relationship between English and programming languages to establish the extent to which learning a programming language that uses English adds a burden to non-native English-speaking (NNES) learners. Learning English is required, desired, or both, depending on the characteristics of learners' environment. It is beneficial, as English proficiency is a desirable skill. However, it also presents an overload, as learners need to understand the concepts as well as the language in which these are taught
Chapter
Digital evolution has made various services and products available at everyone’s fingertips and made human lives easier. It has become necessary for individuals with a passion to be a part of this digital evolution to learn how to write code, which is the basic literacy of the digital age. But writing code has become a privilege for students with prior knowledge of English. This project aims to remove this language barrier by teaching students to solve coding problems in their native language and to convert their logic to code. The paper presents a platform where students provide their logic to coding problems in their native language in plain text, which is then converted to python code using natural language processing techniques. The current platform can successfully identify and convert conditional statements in the Kannada language into python code. The next effort will be aimed at extending this to recognize loop statements and create a framework for a wide variety of languages.
Article
Full-text available
This review analyzes and synthesizes current research on science education with ELLs. Science learning outcomes with ELLs are considered in the context of equitable learning opportunities. Then, theoretical perspectives guiding the research studies reviewed here are explained, and the methodological and other criteria for inclusion of these research studies are described. Next, the literature on science education with ELLs is discussed with regard to science learning, science curriculum (including computer technology), science instruction, science assessment, and science teacher education. Science education initiatives, interventions, or programs that have been successful with ELLs are highlighted. The article summarizes the key features (e.g., theoretical perspectives and methodological orientations) and key findings in the literature, and concludes with a proposed research agenda and implications for educational practice.
Article
Reservations in institutes of higher education may not ideally ensure the production of high quality research and knowledge, necessary for a country's development and for its very self-preservation. This article suggests that apart from ensuring the spread of universal primary education, efforts should be invested to set in place mechanisms to spot, nurture and direct talent at the primary school stage itself so that children's abilities, irrespective of caste or any other identity restrictions, are moulded towards the search for excellence.
Article
This review examines current research on teaching English Language Learners (ELLs) in four content area subjects: History, math, English, and science. The following topics are examined in each content area: The linguistic, cognitive, and sociocultural features of academic literacy and how this literacy can be taught; general investigations of teaching; and professional development or teacher education issues. The article summarizes key findings in the literature, examining trends and discontinuities across the different content areas, and concludes with implications for teaching and suggestions for further research.
Conference Paper
Babylscript is a multilingual version of JavaScript. It has different language modes in which keywords, objects, and functions are translated to non-English languages. Babylscript uses separate tokenizers for each language and extends JavaScript's object model by allowing properties to have multiple names, so objects expose different interfaces for different languages.
Introduction to programming languages Computer Science Department, Walla Walla College
  • A Aaby
Non-English-based programming languages
  • Wikipedia
Wikipedia. (2015). Non-English-based programming languages. Retrieved May 8, 2015, from http://en.wikipedia.org/wiki/Non-English-based_programming_languages
Non-English Based Programming Languages
  • F P Miller
  • A F Vandome
  • J Mcbrewster
Miller, F. P., Vandome, A. F., & McBrewster, J. (2012). Non-English Based Programming Languages. Alphascript Publishing.