ArticlePDF Available

An Empirical Study of Programming Language Trends

Authors:

Abstract and Figures

Predicting software engineering trends is a strategically important asset for both developers and managers, but it's also difficult, due to the wide range of factors involved and the complexity of their interactions. This paper reveals some interesting trends and a method for studying other important software engineering trends. This article trades breadth for depth by focusing on a small, compact set of trends involving 17 high-level programming languages. We quantified many of their relevant factors, and then collected data on their evolution over 10 years. By applying statistical methods to this data, we aim to gain insight into what does and does not make a language successful.
Content may be subject to copyright.
72 IEEE SOFTWARE
Published by the IEEE Computer Society
0740-7459/05/$20.00 © 2005 IEEE
Watching software engineering trends.
How do we identify, quantify, and meas-
ure relevant factors?
Predicting software engineering trends.
How early can we predict success or failure?
Adapting to software engineering trends.
How can we assess a trend’s impact on a
given sector of activity?
Affecting software engineering trends.
Can we affect them at all? If so, who can
affect them? (Academics? Researchers?
Governmental agencies? Industrial organ-
izations? Professional bodies? Standards
organizations?)
This article trades breadth for depth by
focusing on a small, compact set of trends
involving 17 high-level programming lan-
guages. We quantified many of their rele-
vant factors, then collected data on their
evolution over 10 years. By applying statis-
tical methods to this data, we aim to gain
insight into what does and does not make a
language successful. In the long run, we
want to address several questions, including
these:
What determines a programming lan-
guage’s success? The history of program-
ming languages has many instances of ex-
cellent languages that fail and lesser
languages that succeed—so technical merit
is only part of the story.
What factors should we look at? What are
the most important factors of a program-
ming language?
What are the historical trends? How can
we model their evolution?
Can we predict future trends? If so, how?
Does governmental support help a lan-
guage? To what extent? The history of pro-
gramming languages has at least two exam-
ples of languages that were supported by
governments but (hence?) didn’t succeed.
feature
An Empirical Study
of Programming
Language Trends
Predicting software engineering trends is a strategically important
asset for both developers and managers, but it’s also difficult, due
to the wide range of factors involved and the complexity of their
interactions.1–4 In earlier work,1we sketched the outlines of a
general solution. We had divided issues into four broad categories:
programming languages
Yaofei Chen, Rose Dios, Ali Mili, and Lan Wu, New Jersey Institute of Technology
Kefei Wang, State University of New York, Albany
What languages did
programmers use
most in 1993,
1998, and 2003?
This analysis reveals
some interesting
trends and
a method for
studying other
important software
engineering trends.
Focus on programming languages
Although programming languages aren’t
necessarily what we think of when we talk
about software engineering trends, we chose
them for this first experiment for several rea-
sons, including the following:
They are important artifacts in the history
of software engineering.
They represent a unity of purpose and
general characteristics across several
decades of evolution.
They offer a wide diversity of features and
a long historical context, thereby afford-
ing us precise analysis.
Their history is relatively well docu-
mented, and their important characteris-
tics relatively well understood.
Figure 1 summarizes the genesis of to-
day’s main high-level languages.5We chose
the following 17 languages for their diversity
and their technical or historical interest:
Ada, Algol, APL, Basic, C, C++, Cobol, Eiffel,
May/June 2005
IEEE SOFTWARE 73
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
Smalltalk 80
Ruby
SML
OCaml
Perl
Perl 5
C# 2.0 (beta)Java 2 (v1.5 beta)
Fortran I
PL/I
Algol 60
Fortran 77
Scheme
Common Lisp
Scheme R5RS
Pascal
Haskell
Fortran 90
Prolog
Cobol
Smalltalk
C (K&R)
Tcl
C++
Java
C#
Python
Python 2.0
Lisp
C++ (ISO)
Ada 83
Eiffel
ML
Caml
Figure 1. A brief
history of high-level
programming languages
from 1956 to 2004.
Fortran, Java, Lisp, ML, Modula, Pascal, Pro-
log, Scheme, and Smalltalk. We focus only on
third-generation general-purpose languages
and do not include other generations or
scripting languages, such as assembly lan-
guage, SQL, Perl, ASP, PHP, or JavaScript.
To model these languages’ evolution, we
represent each language using a set of factors,
which we divide into two categories—intrinsic
and extrinsic.
Intrinsic factors
Intrinsic factors are those we can use to de-
scribe programming languages’ general design
criteria. We’ve identified 11 such factors:6,7
Generality. Avoiding special cases in the
availability or use of constructs and com-
bining closely related constructs into a sin-
gle, more general one.
Orthogonality. The ability to combine lan-
guage constructs in some meaningful way
such that the interaction of constructs, or
the context of use, does not cause arbitrary
restrictions or unexpected behaviors.
Reliability. The extent to which a language
aids the design and development of reliable
programs.
Maintainability. The extent to which a lan-
guage promotes ease of program mainte-
nance, including, among other things, pro-
gram readability.
Efficiency. The extent to which a language
design facilitates the production of efficient
programs. Translators and users should
easily recognize constructs that have unex-
pectedly expensive implementations.
Simplicity. The simplicity of a language de-
sign, including such measurable aspects as
the minimality of required concepts and the
integrity and consistency of its structures.
Machine independence. The extent to
which the language semantics are defined
independently of machine-specific details.
Good languages shouldn’t dictate the char-
acteristics of object machines or operating
systems.
Implementability. The extent to which a lan-
guage comprises features that are under-
stood and can be implemented economically.
Extensibility. The extent to which a language
has general mechanisms for users to add
features.
Expressiveness. The ability to express com-
plex computations and complex data struc-
tures in appealing, intuitive ways.
Influence or impact. The extent to which
a language has influenced the design and
evolution of other languages and the disci-
pline of language design in general.
We chose these factors for their general
significance, relative completeness, and rela-
tive orthogonality.8We don’t claim that our
list is either complete or orthogonal—just
that it’s sufficiently rich to enable us to cap-
ture meaningful aspects of programming lan-
guage evolution.
Extrinsic factors
Whereas intrinsic factors reflect properties
of the language itself, extrinsic factors charac-
terize the historical context in which the lan-
guage has emerged and evolved; these factors
evolve with time. We represent these by chrono-
logical sequences of values rather than single
values. We’ve identified six groups of extrinsic
factors for this study:
institutional support,
industrial support,
governmental support,
organizational support,
grassroots support, and
technology support.
For example, grassroots support reflects
the amount of support that the language is
getting from practitioners regardless of institu-
tional, organizational, or governmental pres-
sures. Specific questions include
How many people consider this their pri-
mary language?
How many people know this language?
How many user groups are dedicated to
the use, evolution, or dissemination of this
language?
We decompose and define the other extrinsic
factors similarly, using quantitative questions.
Quantifying factors
Most of the intrinsic factors we introduced
earlier are factors for which we have a good
intuitive understanding but no accepted quan-
titative formula. To quantify them, we chose
for each factor a set of discrete features that
74 IEEE SOFTWARE
www.computer.org/software
We chose these
intrinsic
factors for
their general
significance,
relative
completeness,
and relative
orthogonality.
are usually associated with it. Then we ranked
these features from 1 (lowest) to N(highest),
where Nis the number of features. We then
derived a language’s score as the sum of all the
scores that correspond to its features.
For example, to quantify generality, we
considered 10 features, ranging from offering
constant literals (score: 1) to offering generic
abstract data types (score: 10). (For a detailed
explanation of how we compute all the intrin-
sic factors, see http://swlab.njit.edu/tech-
watch.) We acknowledge that this method is
controversial because it sounds arbitrary.
However, we find it adequate for our purposes
because it generally reflects our intuition
about how candidate languages compare with
respect to each intrinsic factor.
Quantifying extrinsic factors is relatively
easy because most of them ask for numbers.
We’ll just use the numbers as the value of each
extrinsic factor. We will encounter difficulties
deriving these numbers in practice, but that’s a
data collection issue (we’ll come back to this
later), not a quantification issue.
Empirical investigation
Before we present our summary statistical
model, we start with the following premises:
We adopt the intrinsic factors as the
model’s independent variables, because
they influence the fate of a language but
are themselves constant.
Because many extrinsic factors feed into
themselves and might influence others, we
adopt past values of the extrinsic factors
as independent variables.
We adopt present and future values of the
extrinsic factors as the model’s dependent
variables.
We don’t represent a language’s status by the
simple binary premise of successful or un-
successful, as this would be arbitrarily judg-
mental. Rather, we represent its status by the
vector of all its current extrinsic factors.
Thus, as Figure 2 shows, our model’s inde-
pendent variables include the intrinsic factors
and the past history of extrinsic factors, and
the dependent variables include the current (or
future) values of the extrinsic factors.
To evaluate intrinsic factors, we use the
quantification procedures we discussed earlier.
We refer to the original language manual and
determine whether the language offers each
relevant feature.
To collect information about grassroots sup-
port, we set up a Web-based survey form and
invited software engineering professionals to
fill it out online. The information we re-
quested from participants pertained to their
knowledge of, familiarity with, and practice of
relevant languages for the current year (we
conducted the survey reported here in 2003)
as well as for 1998 and 1993. We publicized
our survey widely through professional chan-
nels (for example, Google, Yahoo, and other
computer professional newsgroups) to maxi-
mize participation.
Collecting information for the other ex-
trinsic factors is significantly more difficult
than for intrinsic factors or grassroots sup-
port. For the sake of illustration, we briefly
discuss the factor of institutional support,
which requires such information as the num-
ber of students who know about a language
and the number of students who use some lan-
guage as their primary language for school
work. To derive this factor, we
selected a set of universities worldwide (in
the US, Canada, Europe, Asia, Africa, and
the Middle East), where each one repre-
sents a class of similar universities;
obtained syllabus information to infer lan-
guage usage for 1993, 1998, and 2003;
obtained enrollment information through
published resources or through direct con-
tact; and
prorated the results for each university in
the sample using the number and size of
other universities of the same class.
May/June 2005
IEEE SOFTWARE 75
Model
I1
I Intrinsic factors
e Past history of extrinsic factors
E Curent or future extrinsic factors
Im
e1
F (I1,..., In,
E1,..., Ek,)
E1
E2
Ek
ek
Figure 2. Our model of programming
language trends.
Extrinsic
factors
characterize
the historical
context in
which the
language has
emerged and
evolved.
76 IEEE SOFTWARE
www.computer.org/software
(a)
(b)
(c)
Percent of respondentsNumber of studentsNumber of companies
0
5
10
15
20
25
1993 1998 2003
0
500
1,000
1,500
2,000
2,500
3,000
3,500
1993 1998 2003
0
50
100
150
200
250
1993 1998
Year
Year
Year
2003
Ada Basic
C C++
Fortran Java
Pascal Smalltalk
Ada Basic
C C++
Fortran Java
Pascal Smalltalk
Ada Basic
C C++
Fortran Java
Pascal Smalltalk
Figure 3. Usage trends
of the eight most
popular programming
languages, 1993–2003:
(a) all respondents’
responses regarding
their primary
programming language
(grassroots support);
(b) students’ primary
language use for their
coursework
(institutional support);
(c) companies’ primary
language use for
product development
(industrial support).
Data analysis
In this project, we used factor analysis9to in-
vestigate the latent factors in intrinsic- and ex-
trinsic-factor groups (and canonical analysis for
advanced study). In this article, we focus on the
raw data, the models we constructed, and the
relevant results we derived from our analysis.
Raw data
According to the data we collected, the five
most popular programming languages (the ones
most people considered their primary program-
ming language) in 1993 were C, Pascal, Basic,
Fortran, and C++. The five most popular lan-
guages in 1998 were C, C++, Smalltalk, Fortran,
and Pascal. The five most popular languages in
2003 were C++, Java, Smalltalk, Ada, and For-
tran. Figure 3a shows these languages’ usage
trends from 1993 to 2003 for the grassroots
support factor. The data represents the percent-
age of survey respondents in 1993, 1998, and
2003 who considered each of these languages as
their primary programming language.
Figures 3b and 3c each show the sample
raw data for institutional support and indus-
trial support, respectively. Figures describing
other raw data and the complete data ware-
house are available on the project Web site.
Statistical results
We use standard factor analysis and canon-
ical correlation to assess the relationship be-
tween variables. We performed two kinds of
analysis: one with only the factors in the in-
trinsic group, and the other with both intrinsic
and extrinsic factors.9
We did the first analysis to seek the mean-
ingful relationships between a language’s in-
trinsic factors and the value of its dependent
variables. As an example, we consider here the
impact of intrinsic factors on the number of
developers who consider each language as
their primary development language. Table 1
presents our sample correlation results.
The data shows that machine independ-
ence, extensibility, and generality have more
impact on this extrinsic factor than other in-
trinsic factors. After analyzing this data for all
factors, we found that the most important in-
trinsic factors are generality, reliability, ma-
chine independence, and extensibility.
We applied the second model to show the
correlations between all factors, including in-
trinsic and extrinsic ones. Most of the time,
the relationships in the first model didn’t show
up in the second analysis. Some relationships
were noteworthy, such as those with variables
from technology groups; some just show
highly related factors between some variables.
Space limitations prohibit us from presenting
all the results in detail, but the rotated factor
pattern for extrinsic factors supports the fol-
lowing two conclusions:
Factors that fall under institutional sup-
port play an important role in many of the
seven factors. Perhaps this reflects that,
with the five-year step of our study (1993,
1998, 2003), we have an opportunity to
show how institutional decisions affect in-
dustrial trends through student training.
Factors that fall under technology support
play an important role in many of the
seven factors; in fairness, that might be a
consequence of a language’s success rather
than its cause.
To show a language trend, we construct
multivariate regression models10 using the in-
dependent intrinsic and extrinsic factors. The
multivariate regression equation has the form
Y= A+ B1X1+ B2X2+ ... + BkXk+ E
where Yis the dependent variable’s predicted
value, Ais the Yintercept, Bis the various coef-
ficients for regression, Xrepresents the various
independent variables, and E is an error term.
We use the SAS statistics package to ana-
lyze the raw data and construct the statistical
May/June 2005
IEEE SOFTWARE 77
Table 1
Sample correlation results for intrinsic factors only
Intrinsic factor Correlation with no. of developers who
consider consider each language their primary one
Generality 0.6913
Orthoganality 0.0199
Reliability 0.3199
Maintainability 0.0470
Efficiency 0.0703
Simplicity –0.4703
Implementability –0.3390
Machine independence 0.8876
Extensibility 0.7625
Expressiveness 0.3024
Influence/impact 0.0552
models. Our factor analysis and regression re-
ports are available at the project’s Web site.
Toward a predictive model
To predict the future trends of program-
ming languages, we can revise our original re-
gression models. The derivative model will
show the relationships among the data from
1993, 1998, and 2003. We construct the de-
rivative regression models as follows:
E2003 = A* I+ B* E1998 + C* E1993 + D
where E2003, E1998, and E1993 are the values of
extrinsic factors in 2003, 1998, and 1993, re-
spectively; Ais the parameter matrix for the
intrinsic factors; Iis the value of the intrinsic
factors; Band Care the parameter matrices
for the extrinsic factors in 1998 and 1993, re-
spectively; and D is a constant value.
Validation
We construct this derivative model using 12
languages and use five languages to validate it.
We consider the extrinsic factor of “What per-
centage of people knew this programming lan-
guage in 2003?” and compare the actual value
collected from our survey against the predicted
value produced by our regression model.
Table 2 presents our results. We used the F-
Statistic, a standard statistical method to check
if there are significant differences between two
groups, to validate the prediction. In the F-
table, for a = 0.05, Fmust be greater than 4.49
to reject the hypothesis of statistical correlation.
Because our Fvalue is 0.235, the hypothesis is
validated.
Application
Based on the assumption that, on the whole,
trends from 1998 to 2008 should be similar to
those from 1993 to 2003, we use the following
extended derivative model to predict each ex-
trinsic factor’s value in 2008 by submitting the
value in 1998 to the 1993 position and 2003 to
the 1998 position in the model:
E2008 = A* I+ B* E2003 + C* E1998 + D
Using this formula, we can get the value for
each extrinsic factor in 2008. Figure 4 shows
the trends of the most popular languages from
1993 to 2008. It seems that from 2003 to 2008,
Java will be the only language that’s still in-
creasing in popularity. All the others will de-
cline and begin to enter a stable period where
the percentage won’t change too much. Because
this model is based on past history, it’s valid
only as long as the past conditions prevail; it
doesn’t reflect the possible impact of a popular
new language’s emergence. For example, C#
will definitely affect Java’s future popularity, so
we should revise and improve the predictive
model according to new technology changes.
Our statistical analysis has barely ex-
plored our data’s potential. Prospects
of future research include further ana-
lyzing our data as well as exploring other com-
pact sets of trends, such as operating systems,
database systems, or Web browsers. The com-
bined synthesis of all these studies might give
us insights into the evolution of new trends
that have evaded classification thus far.11
78 IEEE SOFTWARE
www.computer.org/software
Table 2
Comparing the model’s actual and predictive values
Language Actual value reported Predictive value produced
by survey (%) by model (%)
Ada 5.19 6.94
Eiffel 5.90 7.16
Lisp 7.68 7.74
Pascal 54.29 48.81
Smalltalk 10.06 8.48
Percent of respondents
0
5
10
15
20
25
30
1993 1998 2003
Year
2008
Ada
C
C++
Fortran
Java
Pascal
Smalltalk
Figure 4. Usage trends
showing the most
popular languages from
1993 to 2008.
References
1. R.D. Cowan et al., “Software Engineering Technology
Watch,” IEEE Software, vol. 19, no. 4, 2002, pp. 123–130.
2. G.A. Moore, Crossing the Chasm, Harper Business, 1999.
3. S.T. Redwine and W.E. Riddle, “Software Technology
Maturation,” Proc. 8th Int’l Conf. Software Eng., IEEE
CS Press, 1985, pp. 189–200.
4. P. Brereton et al., “The Future of Software,” Comm.
ACM, vol. 42, no. 12, 1999, pp. 78–84.
5. E. Levenez, “Computer Languages History,” 2 Mar. 2005;
www.levenez.com/lang.
6. K.C. Louden, Programming Language Principles and
Practice, PWS Publishing, 1993.
7. US Dept. of Defense, “Steelman: Requirements for High
Order Computer Programming Languages,” June 1978;
www.xgc.com/manuals/steelman/t1.html.
8. S. Findy and B. Jacobs, “How to Design a Programming
Language: A Survey of Scripting Programming Language
Feature Options,” 19 Feb. 2004; www.geocities.com/
tablizer/langopts.htm.
9. StatSoft, “Principal Components and Factor Analysis,”
2003; www.statsoftinc.com/textbook/stfacan.html.
10. A.L. Edwards, Multiple Regression and the Analysis of
Variance and Covariance, 2nd ed., W.H. Freeman and
Co., 1979.
11. Y. Chen, “Programming Language Trends: An Empiri-
cal Study,” doctoral dissertation, New Jersey Inst. of
Technology, 2003.
For more information on this or any other computing topic, please visit our
Digital Library at www.computer.org/publications/dlib.
About the Authors
Yaofei Chen is a senior researcher at Principia Partners. His research interests are in soft-
ware engineering and programming languages. He received his PhD in computer and informa-
tion science from New Jersey Institute of Technology. Contact him at the Dept. of Computer Sci-
ence, NJIT, Newark, NJ 07102; yfchen@cis.njit.edu.
Rose Dios is an associate professor in the Department of Mathematical Science at the New Jersey Institute of Technol-
ogy, where she received her PhD in mathematics. Her research interests include risk analysis, statistical decision theory,
and reliability theory. Contact her at the Dept. of Mathematics, NJIT, Newark, NJ 07102; rodios@m.njit.edu.
Ali Mili is a computer science professor at the New Jersey Institute of Technology. His re-
search interests are in software engineering. He received his Doctorat es-Sciences d’Etat from
the University of Grenoble, France, and his PhD in computer science from the University of Illi-
nois at Urbana-Champaign. Contact him at the Dept. of Computer Science, NJIT, Newark, NJ
07102; mili@oak.njit.edu.
Lan Wu is a doctoral student at the New Jersey Institute of Technology, where she received
her MS in computer science. Contact her at the Dept. of Computer Science, NJIT, Newark, NJ
07102; lw7@njit.edu.
Kefei Wang is a statistician in the California Birth Defects Monitoring Program. He re-
ceived his MS in statistics from the State University of New York, Albany. Contact him at 1917
Fifth St., Berkeley, CA 94710; kwa@cbdmp.org.
Editorial Calendar
2005–2006
SEPTEMBER/OCTOBER
Software Project Management
NOVEMBER/DECEMBER
Predictor Modeling
JANUARY/FEBRUARY
Aspect-Oriented Programming
MARCH/APRIL
Software Architecture:
State of the Practice & Future Directions
MAY/JUNE
Requirements Engineering Update
JULY/AUGUST
Software Testing
... This paper proposes EvaML, an XML-based DSL, to specify interactive sessions for the EVA robotics platform. 2 The EVA robot project is an open-source and low-cost project. All parts of its body are 3D printed and its hardware components are easy to find and assemble, which allows EVA to be assembled and used in healthcare and education projects in developing countries [26]. ...
... The result of each dimension will be calculated from the median of the given responses. Results with medians in the interval [1,2) indicate that the participant strongly disagrees with the question. The results with medians in the interval [2,3) indicate that the participant partially disagrees. ...
... Results with medians in the interval [1,2) indicate that the participant strongly disagrees with the question. The results with medians in the interval [2,3) indicate that the participant partially disagrees. Results with medians in the interval [3,4) indicate that the participant neither disagrees nor agrees with the question. ...
Article
Full-text available
Robotic technologies have not only been used in factories, but increasingly in various environments in people’s daily lives. Robots are used as assistive devices and through interaction with these devices it is possible to expand the physical and cognitive capabilities of humans. A new class of robots, Socially Assistive Robots (SARs), arises from the intersection of two other classes, assistive robots, which provide assistance, usually in healthcare settings, and interactive social robots, that communicate with the user. This work proposes an XML-based domain specific language (DSL) to facilitate the specification of interactive sessions for SARs. The proposed language is called EvaML and was designed for an open-source robotics platform. Although firstly implemented and used with EVA, EvaML can also be used with other SARs with similar communication affordances. To evaluate the EvaML language, an experiment was conducted with 12 software developers. Participants assessed EvaML with regards to clarity, effectiveness, perceived ease of use and usability according to 9 cognitive dimensions of the CDN framework (Cognitive Dimensions of Notations). EvaML was also successfully used by another group of 12 high school students and teachers in a series of activities where they designed and implemented interactive educational applications for EVA. Results indicate that EvaML is perceived as easy to use and usable, making it an adequate tool to enact interaction designs for a SAR platform considering different user profiles and levels of knowledge.
... Intuitively, the prevalence and dominance of multilingual software is justifiable given the impetus (e.g., benefits or even necessity) of using multiple languages in a single software project [1,49]. Indeed, different languages have their own peculiar strengths and weaknesses [10,36]. Thus, combining various languages could be a natural consideration by developers for building software that requires capabilities each best offered by one of the selected languages. ...
... The use of programming languages in software development has been studied, concerning the factors that affect the success of a language [10], the popularity of different languages [4,25,36,43], interactions/relationships (e.g., similarity) 1 In this paper, we refer to as 'languages' any computer languages, including but not limited to programming languages. across languages [4,25,46], as well as evolution of languages in these regards [10,25]. ...
... The use of programming languages in software development has been studied, concerning the factors that affect the success of a language [10], the popularity of different languages [4,25,36,43], interactions/relationships (e.g., similarity) 1 In this paper, we refer to as 'languages' any computer languages, including but not limited to programming languages. across languages [4,25,46], as well as evolution of languages in these regards [10,25]. Relevant studies also have addressed the effects of language use on the defect-proneness [43], quality risk of the lack of maintenance [46], and bug resolution characteristics (e.g., bug fix size and time) [50] of the resulting software. ...
Article
Full-text available
For many years now, modern software is known to be developed in multiple languages (hence termed as multilingual or multi-language software). Yet to this date we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e, language profile ) as a basic element of the multilingual construction in contemporary software engineering is an essential first step. In this paper, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presenting an updated overview of language use in 7,113 GitHub projects spanning five past years by characterizing overall statistics of language profiles, followed by a deeper look into the functionality relevance/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of 10 past years to provide a longitudinal view of how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other findings, our study revealed a growing trend of using 3 to 5 languages in one multilingual software project and noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.
... Software options as a supporting resource for various science and engineering fields and the programming languages that are required and offered through educational programs should be a topic for discussion in educational circles. The comparison of programming languages is not trivial [9,10]. Beyond pure intrinsic characteristics of a language, a programs performance is influenced by the programmer [9]. ...
... Beyond pure intrinsic characteristics of a language, a programs performance is influenced by the programmer [9]. Extrinsic factors related to institutional support, industry support, etc. must be considered [10]. In addition, business considerations may play a dominant role. ...
... Intuitively, this prevalence and dominance is justifiable given the impetus (e.g., benefits or even necessity) of multilingual software development [1,8,75]. Indeed, different languages have their own peculiar strengths and weaknesses [11,49]. Thus, combining various languages could be a natural consideration by developers for building software that requires capabilities each best offered by one of the languages [49,64]. ...
Article
Full-text available
Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e., multilingual bugs). Yet existing tool support for bug detection/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static/dynamic analysis and deep learning (DL) based approaches all face major challenges in addressing multilingual bugs. In this paper, we present xLoc, a DL-based technique/tool for detecting and localizing multilingual bugs. Motivated by results of our bug-characteristics study on top locations of multilingual bugs, xLoc first learns the general knowledge relevant to differentiating various multilingual control-flow structures. This is achieved by pre-training a Transformer model with customized position encoding against novel objectives. Then, xLoc learns task-specific knowledge for the task of multilingual bug detection/localization, through another new position encoding scheme (based on cross-language API vicinity) that allows for the model to attend particularly to control-flow constructs that bear most multilingual bugs during fine-tuning. We have implemented xLoc for Python-C software and curated a dataset of 3,770 buggy and 15,884 non-buggy Python-C samples, which enabled our extensive evaluation of xLoc against two state-of-the-art baselines: fine-tuned CodeT5 and zero-shot ChatGPT. Our results show that xLoc achieved 94.98% F1 and 87.24%@Top-1 accuracy, which are significantly (up to 162.88% and 511.75%) higher than the baselines. Ablation studies further confirmed significant contributions of each of the novel design elements in xLoc. With respective bug-location characteristics and labeled bug datasets for fine-tuning, our design may be applied to other language combinations beyond Python-C.
... As discussed in [18][19][20], we can generally conclude that providing a library is more favorable than creating a new programming language. With this in mind, we can consider using existing libraries like [21], which offer a Basic Linear Algebra Subprograms (BLAS) implementation based on OpenCL. ...
Preprint
Full-text available
The Hiperwalk package is designed to facilitate the simulation of quantum walks using heterogeneous high-performance computing, taking advantage of the parallel processing power of diverse processors such as CPUs, GPUs, and acceleration cards. This package enables the simulation of both the continuous-time and discrete-time quantum walk models, effectively modeling the behavior of quantum systems on large graphs. Hiperwalk features a user-friendly Python package frontend with comprehensive documentation, as well as a high-performance C-based inner core that leverages parallel computing for efficient linear algebra calculations. This versatile tool empowers researchers to better understand quantum walk behavior, optimize implementation, and explore a wide range of potential applications, including spatial search algorithms.
... In addition, new trends influence what kind of languages developers prefer for certain programming domains [26,147]. JavaScript, for example, started out as and still is the programming language used in web browsers. With Node.js, however, JavaScript has also become very popular for writing server applications. ...
Thesis
Full-text available
Polyglot programming allows developers to use multiple programming languages within the same software project. While it is common to use more than one language in certain programming domains, developers also apply polyglot programming for other purposes such as to re-use software written in other languages. Although established approaches to polyglot programming come with significant limitations, for example, in terms of performance and tool support, developers still use them to be able to combine languages. Polyglot virtual machines (VMs) such as GraalVM provide a new level of polyglot programming, allowing languages to directly interact with each other. This reduces the amount of glue code needed to combine languages, results in better performance, and enables tools such as debuggers to work across languages. However, only a little research has focused on novel tools that are designed to support developers in building software with polyglot VMs. One reason is that tool-building is often an expensive activity, another one is that polyglot VMs are still a moving target as their use cases and requirements are not yet well understood. In this thesis, we present an approach that builds on existing self-sustaining programming systems such as Squeak/Smalltalk to enable exploratory programming, a practice for exploring and gathering software requirements, and re-use their extensive tool-building capabilities in the context of polyglot VMs. Based on TruffleSqueak, our implementation for the GraalVM, we further present five case studies that demonstrate how our approach helps tool developers to design and build tools for polyglot programming. We further show that TruffleSqueak can also be used by application developers to build and evolve polyglot applications at run-time and by language and runtime developers to understand the dynamic behavior of GraalVM languages and internals. Since our platform allows all these developers to apply polyglot programming, it can further help to better understand the advantages, use cases, requirements, and challenges of polyglot VMs. Moreover, we demonstrate that our approach can also be applied to other polyglot VMs and that insights gained through it are transferable to other programming systems. We conclude that our research on tools for polyglot programming is an important step toward making polyglot VMs more approachable for developers in practice. With good tool support, we believe polyglot VMs can make it much more common for developers to take advantage of multiple languages and their ecosystems when building software.
Chapter
In Bangladesh, C programming is the first programming language for BSc of computer science students. Since the government introduced ICT subjects in schools and colleges, they have a very basic knowledge of programming language. As a result, most students are not very comfortable with this language and lose interest in it. Moreover, they do not perform well in the examination and they become demotivated in learning. As a consequence, some students drop out, change their department, and continue but do not perform well. In this study, we have tried to find out all of those factors that affect students to learn C programming language as their first programming language and find out the factors for which students do not well in the exam. In this study, we collected data from 67 students. We applied two neural network methods, multilayer perceptron and radial basis function for analyzing data. In both methods, we found that previous experience, mid-term preparation, and learning style are the factors most affected students learning. Between these two models, multilayer perceptron is predicting more accurately with a 91.7% accuracy rate, whereas radius basis function is predicting 83.3% by identifying similar factors.KeywordsNeural networkMultilayer perceptronRadius basis functionFactorsFirst programming languagePrediction model
Conference Paper
Full-text available
We have reviewed the growth and propagation of a variety of software technologies in an attempt to discover natural characteristics of the process as well as principles and techniques useful in transitioning modern software technology into widespread use. What we have looked at is the technology maturation process, the process by which a piece of technology is first conceived, then shaped into something usable, and finally “marketed” to the point that it is found in the repertoire of a majority of professionals.A major interest is the time required for technology maturation — and our conclusion is that technology maturation generally takes much longer than popularly thought, especially for major technology areas. But our prime interest is in determining what actions, if any, can accelerate the maturation of technology, in particular that part of maturation that has to do with transitioning the technology into widespread use. Our observations concerning maturation facilitators and inhibitors are the major subject of this paper.
Conference Paper
Full-text available
The future of computer software is discussed in this article. Society is increasingly dependent on large and complex software systems. Users need software that meets stringent requirements, support a range of interaction styles, produced quickly and can be maintained to keep pace with the ever-increasing demand for functionality, quality, flexibility and cost-effectiveness. Producing such software is difficult and involves high costs and risks. Adopting the most appropriate methods, technologies and tools at just the right time is a major problem for the software industry. Authors aim to extend the debate over how software will be developed and used in two ways by stepping back from a detailed technology focus and incorporating views of experts from different disciplines. The Distributed Center for Excellence in Software Engineering aims at generating new ideas on the future direction of systems and software engineering. Its major purpose was to help meet the long-term planning needs of software based companies including British Telecommunications PLC (BT). Its recommendations, "economics of software" and "society's understanding of software systems" are now formal projects within BT.
Article
Full-text available
Predicting the evolution of software engineering technology is, at best, a dubious proposition; it is fast paced and determined by an array of factors, many of them outside the software engineering arena. The authors discuss their first ventures in this domain and preliminary conclusions. The goal of watching software engineering trends means to determine what information we must gather and maintain to gain a comprehensive view of the discipline and its evolution. This information must be sufficiently rich to support discipline-wide assessments and trend-specific analysis. The authors identified a number of software engineering-specific and technology-related indicators, which they divided into seven categories which are presented.
Article
Predicting the evolution of software engineering technology is, at best, a dubious proposition; most typically, it is a frustrating exercise in disappointment and anxiety. It is not difficult to see why: the evolution of software technology is fast paced, and is determined by a dizzying array of factors, many of them outside the arena of software engineering, and most of them cannot be identified, let alone predicted, with any significant advance notice. In this paper, we briefly discuss our first ventures in this domain, and some (very) preliminary conclusions and resolutions.
Conference Paper
First Page of the Article