Conference PaperPDF Available

Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects

Authors:

Abstract and Figures

Brooks and others long ago suggested that on average computer programmers write the same number of lines of code in a given amount of time regardless of the programming language used. We examine data collected from the CVS repositories of 9,999 open source projects hosted on SourceForge.net to test this assumption for 10 of the most popular programming languages in use in the open source community. We find that for 24 of the 45 pairwise comparisons, the programming language is a significant factor in determining the rate at which source code is written, even after accounting for variations between programmers and projects.
Content may be subject to copyright.
Do Programming Languages Affect Productivity?
A Case Study Using Data from Open Source Projects
Daniel P. Delorey
Brigham Young University
Provo, UT
pierce@cs.byu.edu
Charles D. Knutson
Brigham Young University
Provo, UT
knutson@cs.byu.edu
Scott Chun
Brigham Young University
Provo, UT
chun@cs.byu.edu
Abstract
Brooks and others long ago suggested that on aver-
age computer programmers write the same number of
lines of code in a given amount of time regardless of
the programming language used. We examine data col-
lected from the CVS repositories of 9,999 open source
projects hosted on SourceForge.net to test this assump-
tion for 10 of the most popular programming languages
in use in the open source community. We find that for
24 of the 45 pairwise comparisons, the programming
language is a significant factor in det ermining the rate
at which source code is written, even after accounting
for variations between programmers and projects.
1 Introduction
Brooks is generally credited with the assertion that
annual lines-of-code programmer productivity is con-
stant, independent of programming language. In mak-
ing this assertion, Brooks cites multiple authors includ-
ing [7] and [8]. Brooks states, “Productivity seems con-
stant in terms of elementary statements, a conclusion
that is reasonable in terms of the thought a statement
requires and the errors it may include.” [1] (p. 94) This
statement, as well as the works it cites, however, ap-
pears to be based primarily on anecdotal evidence. We
test this assertion across ten programming languages
using data from open source software projects.
2 Related Work
Various studies of productivity in software develop-
ment have been reported, including [5, 4, 6, 3].
Empirical studies of programmer productivity differ
in the productivity measures used, the types and quan-
tities of data used, the explanatory factors considered,
the goals of the study, and the conclusions reached.
The most common productivity metrics are lines of
code per unit time [5] and function points per unit time
[4, 6, 3]. While compelling arguments are made in the
literature for both of these metrics, we use lines of code
both because the assertion we are testing was stated in
terms of lines of code.
Studies of software development pro ductivity tend
to rely on observational data collected from commer-
cial projects. Maxwell et al. use data c ollected from 99
projects from 37 companies in eight European countries
[5] and data gathered from 206 projects from 26 com-
panies in Finland [4]. Premraj et al. use an updated
version of the same data se t with over 600 projects [6].
Liebchen et al. use a data set representing more than
25,000 projects from a single company [3]. Our data
set was collected from the CVS repositories of 9,999
open source projects hosted on SourceForge.
The data sets used in these studies were each com-
piled manually with some level of subjectivity and
transformation. Given this level of human involve-
ment, the factors they consider are at a high level of
abstraction. For example, the data set in [5] contains
among its variables seven COCOMO factors, includ-
ing required reliability, execution time constraints, and
main storage constraints, each with discrete ordinal
values between 1 and 6. Our data set contains only
those features that can b e calculated from the data in
a CVS repository. As such, our data is limited concep-
tually but has the advantages of being concrete, objec-
tive, and simple to gather.
In each of the papers cited, the stated goal of the
study was to identify the major factors influencing pro-
grammer productivity. The models developed in these
studies were intended to be either predictive, explana-
tory, or both. Our goal is not to construct a predictive
or explanatory model. Rather, we seek only to develop
a model that s ufficiently accounts for the variation in
our data so that we may test the s ignificance of the
First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07)
0-7695-2961-5/07 $20.00 © 2007
Table 1. Top ten programming languages by
popularity rankings
Project Author File Revisio n LOC Final
Rank Rank Rank Rank Rank Rank
C 1 1 2 2 1 1
Java 2 2 1 1 2 2
C++ 4 3 4 4 3 3
PHP 5 4 3 3 4 4
Python 7 7 5 5 5 5
Perl 3 5 9 9 6 6
JavaScript 6 6 6 8 10 7
C# 9 9 7 6 7 8
Pascal 8 10 8 7 8 9
Tcl 11 8 10 10 9 10
estimated effect of programming language.
3 Data Collection
The data we use in our analysis comes from the CVS
repositories of open source projects hosted on Source-
Forge. The tools we developed and methods we em-
ployed in collecting the data are described in [2].
As CVS manages individual changes (called revi-
sions) it records the author of the change, the date
and time the change happened, the number of lines
that were added to and removed from the file, and a
mandatory free-form message supplied by the author.
These minimal data can be combined to produce a rich
set of values describing the environment in which the
change was made.
We collected data from the CVS repositories of 9,999
projects hosted on SourceForge. Our population for
the data collection was the set of projects that met
the following criteria: 1) the project’s development
stage is s et as Production/Stable or Maintenance; 2)
the project is active; 3) the project uses CVS; 4) the
project is open source.
We gathered the entire history for each of the 9,999
CVS repositories and stored the resulting data in a
MySQL relational database using a tool we developed
called cvs2mysql [2]. The resulting raw data contains
records for 7,244,201 files and 26,559,460 changes to
those files made by 23,838 developers.
3.1 Data Preparation
Of the more than 19,000 different file extensions rep-
resented in the SourceForge database, we identified 107
unique programming language extensions. In order to
limit the scope of our study to the languages that are
most widely used, we produced an ordered list of the
most popular programming languages represented in
the database. Popularity is defined here in terms of:
1) total number of projects using the language; 2) to-
tal numb er of authors writing in the language; 3) total
number of files written in the language; 4) total num-
ber of revisions to files written in the language; and
5) total number of lines written in the language. We
ranked each language using these five metrics and cal-
culated the average ranking for each language. We then
ranked the languages by their average rankings to de-
termine an overall ranking. We chose to focus on the
top 10 programming languages which are listed along
with their rankings in Table 1. These 10 languages are
used in 89% of all projects, by 92% of all authors, and
account for 98% of the files, 98% of the revisions, 99%
of the lines of code in our data set. The next three
most popular languages are Prolog, Lisp, and Scheme,
none of which can be easily compared to imperative
and object-oriented languages on a line by line basis
given the differences in programming paradigm.
We compare annual productions per programmer
per language in an effort to limit the impact of normal
variations in the amount of time individual program-
mers commit to development over smaller time peri-
ods. Data collection was limited to the time period
from January 1, 2000 to December 31, 2005.
Our model of aggregating the lines written across
authors, programming languages, and years assumes
that every line committed to CVS by an author was
written by that author during the year in which it was
committed. However, we identified six ways in which
this assumption can be violated:
Migration An existing CVS repository created
by multiple authors and/or over multiple years is
migrated to SourceForge by a single author.
Dead File Restoration When a dead file is re-
stored in CVS, the contents are not differenced
against the pre-removal version.
Multi-Project Files Authors may contribute the
same file to multiple projects.
Gatekeepers Gatekeepers receive credit for all
the lines they commit even if they were not the
author.
Batch Commits An author may work for more
than a year before committing the changes.
Automatic Code Generation The tools an author
uses to program may automatically generate lines
of code which the author then commits to CVS.
While the data collected by CVS does not allow us
to definitively identify all cases that violate our as-
sumptions, we have taken steps to exclude as many
offending cases as possible while sacrificing as few of
the cases that do not violate our assumptions as is rea-
sonable. To remove the migration cases, we excluded
initial revisions for all files in our data set. To re-
move the dead file restoration cases, we excluded all
First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07)
0-7695-2961-5/07 $20.00 © 2007
Table 2. Potential explanatory factors considered
Language Related Factors Per Year Author Related Factors Per Year
For the Current Year For the Current Year
Months since first recorded use Months since first contribution
Active projects using this language Active projects with contributions
Active authors using this language Number of programming languages used
Current files written in this language Current files edited
Total number of lines written in this language Total number of lines written
Aggregated Over Prior Years Aggregated Over Prior Years
Total projects having used this language Total projects with contributions
Total authors having used this language Total number of programming languages used
Total files written in this language Total files edited by this author
Total number of lines written in this language Total number of lines written by this author
Language Specific Author Related Factors Per Year
For the Current Year Aggregated Over Prior Years
Months since first contribution Total number of lines written
Active projects with contributions Total projects with contributions
Current files edited Total files edited by this author
Temporal Factor Calendar Year
revisions that followed a “dead” revision. After re-
moving these, however, significant unrealistic outliers
remained in our data s et. To remove these outliers, we
limited our population to those authors who had writ-
ten fewer than 80,000 lines of source code in a single
year. Since we believe that those authors who wrote
more than 80,000 lines in a single year are exhibiting
one of the non-population behaviors described above,
we also exclude from our analysis the projects to which
they contributed.
After limiting target programming languages and re-
moving observations deemed to be outside our popula-
tion, our target data contains records of 673,528 files,
4,198,724 revisions, and 16,197 authors. These data
are aggregated across author, programming language,
and year into 34,566 observations in our final data set.
4 Data Analysis
The goal of our data analysis is to determine
whether there is evidence in the data we have collected
that programming languages affect annual programmer
productivity. Our dependant variable in this analysis
is the lines of code committed to the CVS rep os itories
of selected SourceForge projects by an individual au-
thor in a single year. Our independent variable is the
programming language being used. We test all pair-
wise differences between the languages, adjusting our
confidence intervals using the Tukey-Kramer Honest
Significant Difference for multiple comparisons.
Clearly there are factors other than programming
language that affect programmer productivity. Before
testing the significance of the programming language
effect, we must account for the effects of these con-
founding variables. We do this by including the con-
founding factors in a multiple linear regression analy-
sis along with the independent variables so that their
effects can be separated. The potential confounding
factors we consider in this analysis are listed in Table
2. It is important to note that our goal is only to sepa-
rate confounding eff ec ts before testing our independent
variable. Our model is not intended to be predictive
or explanatory. Therefore, we do not report the coeffi-
cients or the p-values of the confounding factors.
We develop our mo del by first excluding the pro-
gramming language and considering only the confound-
ing factors as independent variables. We systematically
remove independent variables until we achieve the sim-
plest model that still explains a significant portion of
the variation in our data. To this model we then add
the programming language factor and test its signif-
icance. The procedure for reducing the model is ex-
plained below.
We begin by removing independent variables that
are highly correlated. Using correlated independent
variables in a multiple regression leads to a condition
known as multicolinearity which can affect the preci-
sion of estimates in unexp e cted ways. The Variance
Inflation Factor (VIF) is a measure of multicolinear-
ity. A VIF value grater than 10 is considered large.
Using multicollinearity analysis we remove five of the
independent variables. These variables along with their
VIF values are listed in Table 3.
We next remove independent variables that have no
explanatory power. To be useful as an independent
variable in a multiple linear regression, a variable must
have a linear relationship with the dependent variable.
First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07)
0-7695-2961-5/07 $20.00 © 2007
Table 3. Explanator y factors excluded from our analysis
Factors Excluded Due to High Variance Inflation Factors (VIF Value)
Total authors having used the programming language in prior years (1860)
Total authors using the programming language in the current year (258)
Total projects having used the programming language in prior years (68)
Files written in the programming language in the current year (51)
Active projects using the programming language in current years (12)
Factors Excluded Due to Low Correlation with the Dependant Variable (Correlation)
Months since the first recorded use of the programming language (0.0071)
Calendar Year (0.0093)
Factors Excluded Due to Practically Insignificant Coefficients (Coefficient)
Total number of lines written in the language during the current year (0.0000)
Total number of lines written in the language during prior years (0.0001)
Factors Removed During Variable Selection Using the Cp Statistic
Total number of languages used by the author during prior years
Total number of files written in the language during prior years
Correlation is a measure of linear relationship. Using
the correlation between each independent variable and
the dependant variable methods we are able to remove
two of the independent variables. These variables along
with their correlation coefficients are listed in Table 3.
Fitting a regression on the remaining variables we
find that two of the variables have an estimate coeffi-
cient equal to or near zero. These coefficients are not
statistically significant, but more importantly, they are
not practically significant either, so they are removed.
These variables along with their estimated co e fficie nts
are listed in Table 3.
Finally, the last step in reducing our model is to
fit regressions using all possible subsets of the remain-
ing variables and pick the model that best satisfies a
model-fitting criterion. The model fitting criterion we
use is the Cp statistic. The Cp statistic focuses directly
on the trade-off between bias due to excluding impor-
tant independent variables and extra variance due to
the inclusion of too many variables. Using Cp selec-
tion on the remaining 16 independent variables, we
find the model with the lowest Cp statistic in which
all independent variables are significant contains 14 in-
dependent variables. The two independent variables
excluded from this model are listed in Table 3.
Our final model contains 14 independent variables.
Again, the goal of our analysis is not to create a pre-
dictive or an explanatory model but rather to control
as much of the variation in the data as possible before
testing the significance of the effect of programming
language on average annual programmer productivity.
Therefore, we do not explicitly present the independent
variables included in our model to prevent the casual
reader from interpreting our model as explanatory or
predictive. For the curious reader, the independent
variables included in our model can be determined us-
ing Table 2 and Table 3. The R
2
for our model is 0.80
meaning that it explains 80% of the variation in our
data. All the independent variables are statistically
significant at p < 0.05. The model is significant at
p < 0.0001.
5 Results
To test the assertion that programmer productivity
is constant in terms of lines of code per year regardless
of the programming language being used, we fit a model
consisting of the 14 independent variables selected in
Section 4 to adjust for variation in programmer ability
and programming language use. To this model, we add
indicator variables for the programming languages we
are considering. By running the analysis nine times and
using a different language as the reference each time, we
are able to determine the estimated differences between
the languages and the standard errors for each of those
estimates which we then use to test the significance of
the differences.
The null hypothesis for our tests is that there will be
no difference in estimated average annual productions
per programmer for any of the languages. However, we
find evidence in the data to reject the null hypothesis
for 24 of the 45 pair-wise comparisons. The p-values
for the comparisons, adjusted using the Tukey-Kramer
Honest Significant Difference for multiple comparisons
are listed in Table 4. The shaded cells are the com-
parisons for which we reject the null hypothesis with
95% confidence or greater. To clarify the magnitudes
of the differences, Figure 1 shows the estimated average
annual productions for each language.
Using Table 4 and Figure 1 together we can obse rve
groupings in the languages. Python, which sits near the
First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07)
0-7695-2961-5/07 $20.00 © 2007
Table 4. Pair-wise language comparisons
JavaScript Perl Tcl Python PHP Java C C++ C#
Perl 0.46
Tcl 0.60 1.00
Python 0.00 0.00 0.76
PHP 0.00 0.00 0.08 0.72
Java 0.00 0.00 0.02 0.18 1.00
C 0.00 0.00 0.00 0.01 0.53 1.00
C++ 0.00 0.00 0.00 0.00 0.01 0.07 0.59
C# 0.00 0.00 0.00 0.02 0.26 0.50 0.83 1.00
Pascal 0.00 0.00 0.00 0.00 0.10 0.26 0.60 0.99 1.00
Figure 1. Estimated Average Productions
middle of the range of estimated annual productions,
for example, follows a different paradigm from the lan-
guages on each end of the range (JavaScript and Perl
on the left and C, C++, C#, and Pascal on the right),
but it is not significantly different from the other lan-
guages near the middle (Tcl, PHP, and Java). Further
analysis may reveal that programming language para-
digm influences programmer productivity.
6 Conclusions
We find significant evidence in our data that, even
after accounting for variations in programmers and
environments, programming languages are as sociated
with significant differences in annual programmer pro-
ductivity. The reader must be careful, however, not to
infer a cause-and-effect relationship based solely on this
study. Our analysis relies on obse rvational data gath-
ered from SourceForge.net CVS repositories. This is a
strength in that the data represent an unaltered soft-
ware development environment. However, it does limit
the inferences we can m ake both in terms of cause-and-
effect and generalization.
Nevertheless, the results of this study suggest a
number of interesting avenues for future research. For
example, there is a general progression in Figure 1 from
newer, higher-level interpreted languages to older, com-
piled languages. This progression may imply a rela-
tionship between the level of abstraction of a language
and the speed at which developers can write source
code in that language. Brooks supported the assump-
tion of constant productivity as “reasonable in terms
of the thought a statement requires and the errors it
may include.” However, it is quite pos sible that to-
day’s higher-level languages require more thought per
line or allow more errors per line than their predeces-
sors. More research is needed to better understand
the trade-offs between the power provided by languages
with higher levels of abstraction and the cognitive load
placed on their users.
We expe ct that this model of using large-scale, longi-
tudinal studies of Open Source projects to empirically
test long-held assumptions in software engineering re-
search will become more prevalent as the tools and
methods for collecting and analyzing data from soft-
ware repositories mature. Such studies are necessary
in order to build a more firm foundation for under-
standing the similarities and differences between Open
Source and other software development models.
References
[1] F. P. Brooks. The Mythical Man-Month: Essays on
Software Engineering. Addison Wesley, Boston, MA,
1995.
[2] D. Delorey, C. Knutson, and A. MacLean. A com-
prehensive evaluation of production phase sourceforge
projects: A case study using cvs2mysql and the source-
forge research archive. Manuscript Under Review, 2007.
[3] G. A. Liebchen and M. Shepperd. Software productivity
analysis of a large data set and issues of confidential-
ity and data quality. In Proceedings of the 11th IEEE
International Software Metrics Symposium (METRICS
2005), 2005.
[4] K. D. Maxwell and P. Forselius. Benchmarking software
development productivity. IEEE Software, pages 80–88,
January 2000.
[5] K. D. Maxwell, L. V. Wassenhove, and S. Dutta. Soft-
ware development productivity of e uropean space, mili-
tary, and industrial applications. IEEE Transactions on
Software Engineering, 22(10):706–718, October 1996.
[6] R. Premraj, M. Shepperd, B. Kitchenham, and
P. Forselius. An empirical analysis of software pro-
ductivity over time. In Proceedings of the 11th IEEE
International Software Metrics Symposium (METRICS
2005), 2005.
[7] W. M. Taliaffero. Modularity. the key to system growth
potential. IEEE Software, 1(3):245–257, July 1971.
[8] R. W. Wolverton. The cost of developing large-
scale software. IEEE Transactions on Computers, C-
23(6):615–636, June 1974.
First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07)
0-7695-2961-5/07 $20.00 © 2007
... Python has two aspects that make it suitable for writing software that performs data analytics and numerical analysis. Firstly, it is dynamically typed, which reduces code complexity and verbosity 28 . Secondly, it has a strong ecosystem of scientific libraries and tools to mitigate the performance and memory penalties that come with using a dynamically typed, byte-code interpreted language and runtime. ...
Article
Full-text available
The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. As of May 23rd, 2021, over 5 million participants have collectively logged over 360 million self-assessment reports since its introduction in March 2020. The success of the Covid Symptom Study creates significant technical challenges around effective data curation. The primary issue is scale. The size of the dataset means that it can no longer be readily processed using standard Python-based data analytics software such as Pandas on commodity hardware. Alternative technologies exist but carry a higher technical complexity and are less accessible to many researchers. We present ExeTera, a Python-based open source software package designed to provide Pandas-like data analytics on datasets that approach terabyte scales. We present its design and capabilities, and show how it is a critical component of a data curation pipeline that enables reproducible research across an international research group for the Covid Symptom Study.
... Choice of programming language can also have an impact on company and developer productivity [5], performance of developed software and overall happiness of the development team [6]. Working with popular languages means there will be easy access to talent as making a product in a language which is not widely known can make it harder to hire employees with sufficient knowledge. ...
... Python has two aspects that make it suitable for writing software that performs data analytics and numerical analysis. Firstly, it is dynamically typed, which reduces code complexity and verbosity 24 . Secondly, it has a strong ecosystem of scientific libraries and tools to mitigate the performance and memory penalties that come with using a dynamically typed, byte-code interpreted language and runtime. ...
Preprint
Full-text available
The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. Over 4.7 million participants and 189 million unique assessments have been logged since its introduction in March 2020. The success of the Covid Symptom Study creates technical challenges around effective data curation for two reasons. Firstly, the scale of the dataset means that it can no longer be easily processed using standard software on commodity hardware. Secondly, the size of the research group means that replicability and consistency of key analytics used across multiple publications becomes an issue. We present ExeTera, an open source data curation software designed to address scalability challenges and to enable reproducible research across an international research group for datasets such as the Covid Symptom Study dataset.
Article
Full-text available
Visual Programming Languages (VPLs) provide ease of programming by reducing the need of manually typing code for programming. Although the existence of VPLs is almost as old as textual programming languages but they have not become a mainstream technology for developing professional programs. However, the recent introduction of web-based VPLs, such as Scratch and Snap, has reinvigorated the usefulness of VPLs. Today, there exist dozens of VPLs having diverse characteristics. However, a comprehensive analysis of these diverse visual programming languages has never been conducted. Such an analysis is required for identifying the strengths and weaknesses of VPLs, as well as to choose the most suitable VPS for the task in hand. To that end, this study has performed a comprehensive search of a large number of 40 VPLs and analyzed and compared these VPLS based on 14 characteristics.
Chapter
This chapter discusses prospects for the measurement of quantum software artifacts and processes, describing initial directions and reviewing the scattered and scarce literature on the topic. It examines potential differences and commonalities of classical and quantum software in the context of measurement and identifies future research directions that appear to be more promising to address the specificities of quantum computer–oriented development.
Chapter
The characteristic difficulty in creating pure quantum software is mainly due to the inaccessibility to intermediate states, which makes debugging practically impossible. However, the use of formal methods, which apply rigorous mathematical models to ensure error-free software, can overcome this barrier and enable the production of reliable quantum algorithms and applications right out of the box.
Article
Change-level defect prediction is widely referred to as just-in-time (JIT) defect prediction since it identifies a defect-inducing change at the check-in time, and researchers have proposed many approaches based on the language-independent change-level features. These approaches can be divided into two types: supervised approaches and unsupervised approaches, and their effectiveness has been verified on Java or C++ projects. However, whether the language-independent change-level features can effectively identify the defects of JavaScript projects is still unknown. Additionally, many researches have confirmed that supervised approaches outperform unsupervised approaches on Java or C++ projects when considering inspection effort. However, whether supervised JIT defect prediction approaches can still perform best on JavaScript projects is still unknown. Lastly, prior proposed change-level features are programming language-independent, whether programming language-specific change-level features can further improve the performance of JIT approaches on identifying defect-prone changes is also unknown. To address the aforementioned gap in knowledge, in this paper, we collect and label top-20 most starred JavaScript projects on GitHub. JavaScript is an extremely popular and widely used programming language in the industry. We propose five JavaScript-specific change-level features and conduct a large-scale empirical study (i.e., involving a total of 176,902 changes) and find that 1) supervised JIT defect prediction approaches (i.e., CBS+) still statistically significantly outperform unsupervised approaches on JavaScript projects when considering inspection effort; 2) JavaScript-specific change-level features can further improve the performance of approach built with language-independent features on identifying defect-prone changes; 3) the change-level features in the dimension of size (i.e., LT), diffusion (i.e., NF), and JavaScript-specific (i.e., SO and TC) are the most important features for indicating the defect-proneness of a change on JavaScript projects; and 4) project-related features (i.e., Stars, Branches, Def Ratio, Changes, Files, Defective and Forks) have a high association with the probability of a change to be a defect-prone one on JavaScript projects.
Article
The COVID-19 pandemic is considered as the most crucial global health calamity of the century. It has impacted different business sectors around the world and software development is not an exception. This study investigates the impact of COVID-19 on software projects and software development professionals. We conducted a mining software repository study based on 100 GitHub projects developed in Java using ten different metrics. Next, we surveyed 279 software development professionals for better understanding the impact of COVID-19 on daily activities and wellbeing. We identified 12 observations related to productivity, code quality, and wellbeing. Our findings highlight that the impact of COVID-19 is not binary (reduce productivity vs. increase productivity) but rather a spectrum. For many of our observations, substantial proportions of respondents have differing opinions from each other. We believe that more research is needed to uncover specific conditions that cause certain outcomes to be more prevalent.
Chapter
The article describes the algorithm for creating three-dimensional models of stones from their polygonal mesh and a prototype photo. This method extracts PBR textures from the source image of the object and makes them seamless. Then, using the Blender software, a UV scan of the model is built. At the next stage, the coordinates of the sweep seams are extracted, and the quality of the textures is improved. Then, the resulting textures are superimposed on the object following the UV scan. The result is a three-dimensional model of the object with the textures applied to it. Also, a program was implemented that allows you to perform the above actions with a click of a button, which makes the process of obtaining a finished model as simple as possible. Among other things, the model was successfully exported to other programs working with 3D graphics.
Article
Full-text available
A recurrent concern of instructors and managers in learning and industrial sectors is how to organise the working environment to increase the productivity in tasks such as programming and software testing. Evidence of the increasing interest from different domains in this topic is the growing amount of research that has been published on physical factors (e.g., product, personnel, project, and process), programming tasks (e.g., tests, questionnaires, programming, testing and debugging), and assessment methods (e.g., time, software metrics and academic grading). The objective of this paper is to survey the literature and to enable one to gain valuable insights into the relevance of physical factors to improve programming efficiency, especially in a learning environment. This study also makes recommendations on the techniques that can provide further experience for learners before joining the industrial sector. Finally, this survey suggests research directions, including an analysis of the correlation between physical factors and measurable productivity.
Conference Paper
Full-text available
The paper aims to investigate how software project productivity has changed over time. Within this overall goal we also compare productivity between different business sectors and seek to identify major drivers. We analysed a data set of more than 600 projects that have been collected from a number of Finnish companies since 1978. Overall, we observed a quite pronounced improvement in productivity over the entire time period, though, this improvement is less marked since the 1990s. However, the trend is not smooth. We also observed productivity variability between company and business sector. Whilst this data set is not a random sample so generalisation is somewhat problematic, we hope that it contributes to an overall body of knowledge about software productivity and thereby facilitates the construction of a bigger picture.
Article
Full-text available
The article examines a statistical analysis of a productivity variation, involving a unique database containing 206 business software projects from 26 Finnish companies. The authors examine differences in the factors, explaining productivity in the banking, insurance, manufacturing, wholesale/retail, and public administration sectors. The authors provide productivity benchmarking equations that are useful both for estimating expected productivity at the start of a new project and for benchmarking a completed project for each business sector
Conference Paper
A wealth of data can be extracted from the natural byproducts of software development processes and used in empirical studies of software engineering. However, the size and accuracy of such studies depend in large part on the availability of tools that facilitate the collection of data from individual projects and the combination of data from multiple projects. To demonstrate this point, we present our experience gathering and analyzing data from nearly 10,000 open source projects hosted on SourceForge. We describe the tools we developed to collect the data and the ways in which these tools and data may be used by other researchers. We also provide examples of statistics that we have calculated from these data to describe interesting author- and project-level behaviors of the SourceForge community.
Article
General Electric's Apollo Systems has developed several information systems over the last eight years in support of the Apollo project. The expertise gained in these development efforts has shown the efficiency of a modularized approach to retrieval system design Systems analysts and programmers who design and build these systems should follow the normal systems engineering approach of requirements definition, system design, system implementation, test and verification and operational installation. Short-cutting any one of these phases leads to greater effort in one of the later phases, usually with a longer over-all schedule or greater developmental cost Project management, too, is a major factor in the success of such systems. The tracking of critical milestones in the schedule, consistent and up-to-date documentation, and comprehensive test and verification plans are necessary to a controlled approach to systems implementation Finally, the benefits of such an approach are reduced cost and implementation time, along with simplification of system maintenance, standardized software, adaptability to new environments, and a potential for continued growth to meet users' ever-expanding needs.
Conference Paper
The paper reports on an ongoing investigation into software productivity and its influencing factors. Analysis of a data set containing project management of a large multinational company. The data set contains tables holding information about more than 25000 closed projects collected since 1990. Due to incomplete data only 1413 closed projects could be used for the investigation. Confidentiality was also considered as a major issue. The projects in the data set vary greatly in their productivity. Analysis of productivity and its influencing factors
Article
The identification, combination, and interaction of the many factors which influence software development productivity makes the measurement, estimation, comparison and tracking of productivity rates very difficult. Through the analysis of a European Space Agency database consisting of 99 software development projects from 37 companies in a European countries, the paper seeks to provide significant and useful Information about the major factors which influence the productivity of European space, military, and industrial applications, as well as to determine the best metric for measuring the productivity of these projects. Several key findings emerge from the study. The results indicate that some organizations are obtaining significantly higher productivity than others. Some of this variation is due to the differences in the application category and programming language of projects in each company; however, some differences must also be due to the ways in which these companies manage their software development projects. The use of tools and modern programming practices were found to be major controllable factors in productivity improvement. Finally, the lines-of-code productivity metric is shown to be superior to the process productivity metric for projects in the authors' database
Article
The work of software cost forecasting falls into two parts. First we make what we call structural forecasts, and then we calculate the absolute dollar-volume forecasts. Structural forecasts describe the technology and function of a software project, but not its size. We allocate resources (costs) over the project's life cycle from the structural forecasts. Judgment, technical knowledge, and econometric research should combine in making the structural forecasts. A methodology based on a 25 X 7 structural forecast matrix that has been used by TRW with good results over the past few years is presented in this paper. With the structural forecast in hand, we go on to calculate the absolute dollar-volume forecasts. The general logic followed in "absolute" cost estimating can be based on either a mental process or an explicit algorithm. A cost estimating algorithm is presented and five tradition methods of software cost forecasting are described: top-down estimating, similarities and differences estimating, ratio estimating, standards estimating, and bottom-up estimating. All forecasting methods suffer from the need for a valid cost data base for many estimating situations. Software information elements that experience has shown to be useful in establishing such a data base are given in the body of the paper. Major pricing pitfalls are identified. Two case studies are presented that illustrate the software cost forecasting methodology and historical results. Topics for further work and study are suggested.
A comprehensive evaluation of production phase sourceforge projects: A case study using cvs2mysql and the sourceforge research archive
  • D Delorey
  • C Knutson
  • A Maclean
D. Delorey, C. Knutson, and A. MacLean. A comprehensive evaluation of production phase sourceforge projects: A case study using cvs2mysql and the sourceforge research archive. Manuscript Under Review, 2007.