72 IEEE SOFTWARE
Published by the IEEE Computer Society
0740-7459/05/$20.00 © 2005 IEEE
■Watching software engineering trends.
How do we identify, quantify, and meas-
ure relevant factors?
■Predicting software engineering trends.
How early can we predict success or failure?
■Adapting to software engineering trends.
How can we assess a trend’s impact on a
given sector of activity?
■Affecting software engineering trends.
Can we affect them at all? If so, who can
affect them? (Academics? Researchers?
Governmental agencies? Industrial organ-
izations? Professional bodies? Standards
organizations?)
This article trades breadth for depth by
focusing on a small, compact set of trends
involving 17 high-level programming lan-
guages. We quantified many of their rele-
vant factors, then collected data on their
evolution over 10 years. By applying statis-
tical methods to this data, we aim to gain
insight into what does and does not make a
language successful. In the long run, we
want to address several questions, including
these:
■What determines a programming lan-
guage’s success? The history of program-
ming languages has many instances of ex-
cellent languages that fail and lesser
languages that succeed—so technical merit
is only part of the story.
■What factors should we look at? What are
the most important factors of a program-
ming language?
■What are the historical trends? How can
we model their evolution?
■Can we predict future trends? If so, how?
■Does governmental support help a lan-
guage? To what extent? The history of pro-
gramming languages has at least two exam-
ples of languages that were supported by
governments but (hence?) didn’t succeed.
feature
An Empirical Study
of Programming
Language Trends
Predicting software engineering trends is a strategically important
asset for both developers and managers, but it’s also difficult, due
to the wide range of factors involved and the complexity of their
interactions.1–4 In earlier work,1we sketched the outlines of a
general solution. We had divided issues into four broad categories:
programming languages
Yaofei Chen, Rose Dios, Ali Mili, and Lan Wu, New Jersey Institute of Technology
Kefei Wang, State University of New York, Albany
What languages did
programmers use
most in 1993,
1998, and 2003?
This analysis reveals
some interesting
trends and
a method for
studying other
important software
engineering trends.
Focus on programming languages
Although programming languages aren’t
necessarily what we think of when we talk
about software engineering trends, we chose
them for this first experiment for several rea-
sons, including the following:
■They are important artifacts in the history
of software engineering.
■They represent a unity of purpose and
general characteristics across several
decades of evolution.
■They offer a wide diversity of features and
a long historical context, thereby afford-
ing us precise analysis.
■Their history is relatively well docu-
mented, and their important characteris-
tics relatively well understood.
Figure 1 summarizes the genesis of to-
day’s main high-level languages.5We chose
the following 17 languages for their diversity
and their technical or historical interest:
Ada, Algol, APL, Basic, C, C++, Cobol, Eiffel,
May/June 2005
IEEE SOFTWARE 73
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
Smalltalk 80
Ruby
SML
OCaml
Perl
Perl 5
C# 2.0 (beta)Java 2 (v1.5 beta)
Fortran I
PL/I
Algol 60
Fortran 77
Scheme
Common Lisp
Scheme R5RS
Pascal
Haskell
Fortran 90
Prolog
Cobol
Smalltalk
C (K&R)
Tcl
C++
Java
C#
Python
Python 2.0
Lisp
C++ (ISO)
Ada 83
Eiffel
ML
Caml
Figure 1. A brief
history of high-level
programming languages
from 1956 to 2004.
Fortran, Java, Lisp, ML, Modula, Pascal, Pro-
log, Scheme, and Smalltalk. We focus only on
third-generation general-purpose languages
and do not include other generations or
scripting languages, such as assembly lan-
guage, SQL, Perl, ASP, PHP, or JavaScript.
To model these languages’ evolution, we
represent each language using a set of factors,
which we divide into two categories—intrinsic
and extrinsic.
Intrinsic factors
Intrinsic factors are those we can use to de-
scribe programming languages’ general design
criteria. We’ve identified 11 such factors:6,7
■Generality. Avoiding special cases in the
availability or use of constructs and com-
bining closely related constructs into a sin-
gle, more general one.
■Orthogonality. The ability to combine lan-
guage constructs in some meaningful way
such that the interaction of constructs, or
the context of use, does not cause arbitrary
restrictions or unexpected behaviors.
■Reliability. The extent to which a language
aids the design and development of reliable
programs.
■Maintainability. The extent to which a lan-
guage promotes ease of program mainte-
nance, including, among other things, pro-
gram readability.
■Efficiency. The extent to which a language
design facilitates the production of efficient
programs. Translators and users should
easily recognize constructs that have unex-
pectedly expensive implementations.
■Simplicity. The simplicity of a language de-
sign, including such measurable aspects as
the minimality of required concepts and the
integrity and consistency of its structures.
■Machine independence. The extent to
which the language semantics are defined
independently of machine-specific details.
Good languages shouldn’t dictate the char-
acteristics of object machines or operating
systems.
■Implementability. The extent to which a lan-
guage comprises features that are under-
stood and can be implemented economically.
■Extensibility. The extent to which a language
has general mechanisms for users to add
features.
■Expressiveness. The ability to express com-
plex computations and complex data struc-
tures in appealing, intuitive ways.
■Influence or impact. The extent to which
a language has influenced the design and
evolution of other languages and the disci-
pline of language design in general.
We chose these factors for their general
significance, relative completeness, and rela-
tive orthogonality.8We don’t claim that our
list is either complete or orthogonal—just
that it’s sufficiently rich to enable us to cap-
ture meaningful aspects of programming lan-
guage evolution.
Extrinsic factors
Whereas intrinsic factors reflect properties
of the language itself, extrinsic factors charac-
terize the historical context in which the lan-
guage has emerged and evolved; these factors
evolve with time. We represent these by chrono-
logical sequences of values rather than single
values. We’ve identified six groups of extrinsic
factors for this study:
■institutional support,
■industrial support,
■governmental support,
■organizational support,
■grassroots support, and
■technology support.
For example, grassroots support reflects
the amount of support that the language is
getting from practitioners regardless of institu-
tional, organizational, or governmental pres-
sures. Specific questions include
■How many people consider this their pri-
mary language?
■How many people know this language?
■How many user groups are dedicated to
the use, evolution, or dissemination of this
language?
We decompose and define the other extrinsic
factors similarly, using quantitative questions.
Quantifying factors
Most of the intrinsic factors we introduced
earlier are factors for which we have a good
intuitive understanding but no accepted quan-
titative formula. To quantify them, we chose
for each factor a set of discrete features that
74 IEEE SOFTWARE
www.computer.org/software
We chose these
intrinsic
factors for
their general
significance,
relative
completeness,
and relative
orthogonality.
are usually associated with it. Then we ranked
these features from 1 (lowest) to N(highest),
where Nis the number of features. We then
derived a language’s score as the sum of all the
scores that correspond to its features.
For example, to quantify generality, we
considered 10 features, ranging from offering
constant literals (score: 1) to offering generic
abstract data types (score: 10). (For a detailed
explanation of how we compute all the intrin-
sic factors, see http://swlab.njit.edu/tech-
watch.) We acknowledge that this method is
controversial because it sounds arbitrary.
However, we find it adequate for our purposes
because it generally reflects our intuition
about how candidate languages compare with
respect to each intrinsic factor.
Quantifying extrinsic factors is relatively
easy because most of them ask for numbers.
We’ll just use the numbers as the value of each
extrinsic factor. We will encounter difficulties
deriving these numbers in practice, but that’s a
data collection issue (we’ll come back to this
later), not a quantification issue.
Empirical investigation
Before we present our summary statistical
model, we start with the following premises:
■We adopt the intrinsic factors as the
model’s independent variables, because
they influence the fate of a language but
are themselves constant.
■Because many extrinsic factors feed into
themselves and might influence others, we
adopt past values of the extrinsic factors
as independent variables.
■We adopt present and future values of the
extrinsic factors as the model’s dependent
variables.
■We don’t represent a language’s status by the
simple binary premise of successful or un-
successful, as this would be arbitrarily judg-
mental. Rather, we represent its status by the
vector of all its current extrinsic factors.
Thus, as Figure 2 shows, our model’s inde-
pendent variables include the intrinsic factors
and the past history of extrinsic factors, and
the dependent variables include the current (or
future) values of the extrinsic factors.
To evaluate intrinsic factors, we use the
quantification procedures we discussed earlier.
We refer to the original language manual and
determine whether the language offers each
relevant feature.
To collect information about grassroots sup-
port, we set up a Web-based survey form and
invited software engineering professionals to
fill it out online. The information we re-
quested from participants pertained to their
knowledge of, familiarity with, and practice of
relevant languages for the current year (we
conducted the survey reported here in 2003)
as well as for 1998 and 1993. We publicized
our survey widely through professional chan-
nels (for example, Google, Yahoo, and other
computer professional newsgroups) to maxi-
mize participation.
Collecting information for the other ex-
trinsic factors is significantly more difficult
than for intrinsic factors or grassroots sup-
port. For the sake of illustration, we briefly
discuss the factor of institutional support,
which requires such information as the num-
ber of students who know about a language
and the number of students who use some lan-
guage as their primary language for school
work. To derive this factor, we
■selected a set of universities worldwide (in
the US, Canada, Europe, Asia, Africa, and
the Middle East), where each one repre-
sents a class of similar universities;
■obtained syllabus information to infer lan-
guage usage for 1993, 1998, and 2003;
■obtained enrollment information through
published resources or through direct con-
tact; and
■prorated the results for each university in
the sample using the number and size of
other universities of the same class.
May/June 2005
IEEE SOFTWARE 75
Model
I1
I Intrinsic factors
e Past history of extrinsic factors
E Curent or future extrinsic factors
Im
e1
F (I1,..., In,
E1,..., Ek,)
E1
E2
Ek
ek
Figure 2. Our model of programming
language trends.
Extrinsic
factors
characterize
the historical
context in
which the
language has
emerged and
evolved.
76 IEEE SOFTWARE
www.computer.org/software
(a)
(b)
(c)
Percent of respondentsNumber of studentsNumber of companies
0
5
10
15
20
25
1993 1998 2003
0
500
1,000
1,500
2,000
2,500
3,000
3,500
1993 1998 2003
0
50
100
150
200
250
1993 1998
Year
Year
Year
2003
Ada Basic
C C++
Fortran Java
Pascal Smalltalk
Ada Basic
C C++
Fortran Java
Pascal Smalltalk
Ada Basic
C C++
Fortran Java
Pascal Smalltalk
Figure 3. Usage trends
of the eight most
popular programming
languages, 1993–2003:
(a) all respondents’
responses regarding
their primary
programming language
(grassroots support);
(b) students’ primary
language use for their
coursework
(institutional support);
(c) companies’ primary
language use for
product development
(industrial support).
Data analysis
In this project, we used factor analysis9to in-
vestigate the latent factors in intrinsic- and ex-
trinsic-factor groups (and canonical analysis for
advanced study). In this article, we focus on the
raw data, the models we constructed, and the
relevant results we derived from our analysis.
Raw data
According to the data we collected, the five
most popular programming languages (the ones
most people considered their primary program-
ming language) in 1993 were C, Pascal, Basic,
Fortran, and C++. The five most popular lan-
guages in 1998 were C, C++, Smalltalk, Fortran,
and Pascal. The five most popular languages in
2003 were C++, Java, Smalltalk, Ada, and For-
tran. Figure 3a shows these languages’ usage
trends from 1993 to 2003 for the grassroots
support factor. The data represents the percent-
age of survey respondents in 1993, 1998, and
2003 who considered each of these languages as
their primary programming language.
Figures 3b and 3c each show the sample
raw data for institutional support and indus-
trial support, respectively. Figures describing
other raw data and the complete data ware-
house are available on the project Web site.
Statistical results
We use standard factor analysis and canon-
ical correlation to assess the relationship be-
tween variables. We performed two kinds of
analysis: one with only the factors in the in-
trinsic group, and the other with both intrinsic
and extrinsic factors.9
We did the first analysis to seek the mean-
ingful relationships between a language’s in-
trinsic factors and the value of its dependent
variables. As an example, we consider here the
impact of intrinsic factors on the number of
developers who consider each language as
their primary development language. Table 1
presents our sample correlation results.
The data shows that machine independ-
ence, extensibility, and generality have more
impact on this extrinsic factor than other in-
trinsic factors. After analyzing this data for all
factors, we found that the most important in-
trinsic factors are generality, reliability, ma-
chine independence, and extensibility.
We applied the second model to show the
correlations between all factors, including in-
trinsic and extrinsic ones. Most of the time,
the relationships in the first model didn’t show
up in the second analysis. Some relationships
were noteworthy, such as those with variables
from technology groups; some just show
highly related factors between some variables.
Space limitations prohibit us from presenting
all the results in detail, but the rotated factor
pattern for extrinsic factors supports the fol-
lowing two conclusions:
■Factors that fall under institutional sup-
port play an important role in many of the
seven factors. Perhaps this reflects that,
with the five-year step of our study (1993,
1998, 2003), we have an opportunity to
show how institutional decisions affect in-
dustrial trends through student training.
■Factors that fall under technology support
play an important role in many of the
seven factors; in fairness, that might be a
consequence of a language’s success rather
than its cause.
To show a language trend, we construct
multivariate regression models10 using the in-
dependent intrinsic and extrinsic factors. The
multivariate regression equation has the form
Y= A+ B1X1+ B2X2+ ... + BkXk+ E
where Yis the dependent variable’s predicted
value, Ais the Yintercept, Bis the various coef-
ficients for regression, Xrepresents the various
independent variables, and E is an error term.
We use the SAS statistics package to ana-
lyze the raw data and construct the statistical
May/June 2005
IEEE SOFTWARE 77
Table 1
Sample correlation results for intrinsic factors only
Intrinsic factor Correlation with no. of developers who
consider consider each language their primary one
Generality 0.6913
Orthoganality 0.0199
Reliability 0.3199
Maintainability 0.0470
Efficiency 0.0703
Simplicity –0.4703
Implementability –0.3390
Machine independence 0.8876
Extensibility 0.7625
Expressiveness 0.3024
Influence/impact 0.0552
models. Our factor analysis and regression re-
ports are available at the project’s Web site.
Toward a predictive model
To predict the future trends of program-
ming languages, we can revise our original re-
gression models. The derivative model will
show the relationships among the data from
1993, 1998, and 2003. We construct the de-
rivative regression models as follows:
E2003 = A* I+ B* E1998 + C* E1993 + D
where E2003, E1998, and E1993 are the values of
extrinsic factors in 2003, 1998, and 1993, re-
spectively; Ais the parameter matrix for the
intrinsic factors; Iis the value of the intrinsic
factors; Band Care the parameter matrices
for the extrinsic factors in 1998 and 1993, re-
spectively; and D is a constant value.
Validation
We construct this derivative model using 12
languages and use five languages to validate it.
We consider the extrinsic factor of “What per-
centage of people knew this programming lan-
guage in 2003?” and compare the actual value
collected from our survey against the predicted
value produced by our regression model.
Table 2 presents our results. We used the F-
Statistic, a standard statistical method to check
if there are significant differences between two
groups, to validate the prediction. In the F-
table, for a = 0.05, Fmust be greater than 4.49
to reject the hypothesis of statistical correlation.
Because our Fvalue is 0.235, the hypothesis is
validated.
Application
Based on the assumption that, on the whole,
trends from 1998 to 2008 should be similar to
those from 1993 to 2003, we use the following
extended derivative model to predict each ex-
trinsic factor’s value in 2008 by submitting the
value in 1998 to the 1993 position and 2003 to
the 1998 position in the model:
E2008 = A* I+ B* E2003 + C* E1998 + D
Using this formula, we can get the value for
each extrinsic factor in 2008. Figure 4 shows
the trends of the most popular languages from
1993 to 2008. It seems that from 2003 to 2008,
Java will be the only language that’s still in-
creasing in popularity. All the others will de-
cline and begin to enter a stable period where
the percentage won’t change too much. Because
this model is based on past history, it’s valid
only as long as the past conditions prevail; it
doesn’t reflect the possible impact of a popular
new language’s emergence. For example, C#
will definitely affect Java’s future popularity, so
we should revise and improve the predictive
model according to new technology changes.
Our statistical analysis has barely ex-
plored our data’s potential. Prospects
of future research include further ana-
lyzing our data as well as exploring other com-
pact sets of trends, such as operating systems,
database systems, or Web browsers. The com-
bined synthesis of all these studies might give
us insights into the evolution of new trends
that have evaded classification thus far.11
78 IEEE SOFTWARE
www.computer.org/software
Table 2
Comparing the model’s actual and predictive values
Language Actual value reported Predictive value produced
by survey (%) by model (%)
Ada 5.19 6.94
Eiffel 5.90 7.16
Lisp 7.68 7.74
Pascal 54.29 48.81
Smalltalk 10.06 8.48
Percent of respondents
0
5
10
15
20
25
30
1993 1998 2003
Year
2008
Ada
C
C++
Fortran
Java
Pascal
Smalltalk
Figure 4. Usage trends
showing the most
popular languages from
1993 to 2008.
References
1. R.D. Cowan et al., “Software Engineering Technology
Watch,” IEEE Software, vol. 19, no. 4, 2002, pp. 123–130.
2. G.A. Moore, Crossing the Chasm, Harper Business, 1999.
3. S.T. Redwine and W.E. Riddle, “Software Technology
Maturation,” Proc. 8th Int’l Conf. Software Eng., IEEE
CS Press, 1985, pp. 189–200.
4. P. Brereton et al., “The Future of Software,” Comm.
ACM, vol. 42, no. 12, 1999, pp. 78–84.
5. E. Levenez, “Computer Languages History,” 2 Mar. 2005;
www.levenez.com/lang.
6. K.C. Louden, Programming Language Principles and
Practice, PWS Publishing, 1993.
7. US Dept. of Defense, “Steelman: Requirements for High
Order Computer Programming Languages,” June 1978;
www.xgc.com/manuals/steelman/t1.html.
8. S. Findy and B. Jacobs, “How to Design a Programming
Language: A Survey of Scripting Programming Language
Feature Options,” 19 Feb. 2004; www.geocities.com/
tablizer/langopts.htm.
9. StatSoft, “Principal Components and Factor Analysis,”
2003; www.statsoftinc.com/textbook/stfacan.html.
10. A.L. Edwards, Multiple Regression and the Analysis of
Variance and Covariance, 2nd ed., W.H. Freeman and
Co., 1979.
11. Y. Chen, “Programming Language Trends: An Empiri-
cal Study,” doctoral dissertation, New Jersey Inst. of
Technology, 2003.
For more information on this or any other computing topic, please visit our
Digital Library at www.computer.org/publications/dlib.
About the Authors
Yaofei Chen is a senior researcher at Principia Partners. His research interests are in soft-
ware engineering and programming languages. He received his PhD in computer and informa-
tion science from New Jersey Institute of Technology. Contact him at the Dept. of Computer Sci-
ence, NJIT, Newark, NJ 07102; yfchen@cis.njit.edu.
Rose Dios is an associate professor in the Department of Mathematical Science at the New Jersey Institute of Technol-
ogy, where she received her PhD in mathematics. Her research interests include risk analysis, statistical decision theory,
and reliability theory. Contact her at the Dept. of Mathematics, NJIT, Newark, NJ 07102; rodios@m.njit.edu.
Ali Mili is a computer science professor at the New Jersey Institute of Technology. His re-
search interests are in software engineering. He received his Doctorat es-Sciences d’Etat from
the University of Grenoble, France, and his PhD in computer science from the University of Illi-
nois at Urbana-Champaign. Contact him at the Dept. of Computer Science, NJIT, Newark, NJ
07102; mili@oak.njit.edu.
Lan Wu is a doctoral student at the New Jersey Institute of Technology, where she received
her MS in computer science. Contact her at the Dept. of Computer Science, NJIT, Newark, NJ
07102; lw7@njit.edu.
Kefei Wang is a statistician in the California Birth Defects Monitoring Program. He re-
ceived his MS in statistics from the State University of New York, Albany. Contact him at 1917
Fifth St., Berkeley, CA 94710; kwa@cbdmp.org.
Editorial Calendar
2005–2006
SEPTEMBER/OCTOBER
Software Project Management
NOVEMBER/DECEMBER
Predictor Modeling
JANUARY/FEBRUARY
Aspect-Oriented Programming
MARCH/APRIL
Software Architecture:
State of the Practice & Future Directions
MAY/JUNE
Requirements Engineering Update
JULY/AUGUST
Software Testing