Conference PaperPDF Available

Estimating the size, cost, and types of Technical Debt

Authors:
  • CAST Software

Abstract

This study summarizes results of a study of Technical Debt across 745 business applications comprising 365 million lines of code collected from 160 companies in 10 industry segments. These applications were submitted to a static analysis that evaluates quality within and across application layers that may be coded in different languages. The analysis consists of evaluating the application against a repository of over 1200 rules of good architectural and coding practice. A formula for estimating Technical Debt with adjustable parameters is presented. Results are presented for Technical Debt across the entire sample as well as for different programming languages and quality factors.
Estimating the Size, Cost, and Types of Technical Debt
Bill Curtis Jay Sappidi Alexandra Szynkarski
CAST CAST CAST
Fort Worth, Texas, USA New York, NY, USA Paris, France
curtis@acm.org j.sappidi@castsoftware.com a.szynkarski@castsoftware.com
Abstract This study summarizes results of a study of
Technical Debt across 745 business applications comprising
365 million lines of code collected from 160 companies in 10
industry segments. These applications were submitted to a
static analysis that evaluates quality within and across
application layers that may be coded in different languages.
The analysis consists of evaluating the application against a
repository of over 1200 rules of good architectural and coding
practice. A formula for estimating Technical Debt with
adjustable parameters is presented. Results are presented for
Technical Debt across the entire sample as well as for different
programming languages and quality factors.
Keywords- software metrics; software structural quality;
technical debt; static analysis; benchmarking
I. INTRODUCTION
Although there are several definitions of Technical Debt,
we define it as the cost of fixing structural quality problems
in production code that the organization knows must be
eliminated to control development costs or avoid operational
problems. We believe this is the most relevant definition to
industrial practice where the Technical Debt metaphor has
provided a new means of communicating with executive
management about the costs and risks of poor structural
quality in their application portfolio.
The purpose of this study is to explore a method for
quantifying an estimate of the Technical Debt within a
business application. Such studies are needed to help IT
organizations make visible the costs and risks hidden within
their application portfolio, as well as establish a benchmark
for making decisions about investments in application
quality, and especially structural quality.
Structural quality involves the non-functional, internal
characteristics of software. It reflects the engineering
soundness of an application’s architecture and coding, rather
than the correctness with which the application implements
functional requirements. Structural quality characteristics
are critical because they are often difficult to detect through
standard testing, yet they are frequent causes of operational
problems such as outages, performance degradation,
breaches by unauthorized users, and data corruption [1].
Internal quality metrics have been shown to correlate with
criteria such as maintenance effort and defect detection [2,
3]. The first enumeration of such quality characteristics was
provided by Boehm and his colleagues at TRW in the 1970s
[4].
II. THE SAMPLE AND DATA
The data for this study are drawn from the Appmarq
benchmarking repository maintained by CAST, comprised of
745 applications submitted by 160 organizations for analysis,
and consisting of 365 million lines of code or 11.3 million
Backfired Function Points. No applications were accepted
into the sample if they consisted of less than 10 KLOC (kilo
or thousand lines of code). Sixty applications were between
10 and 20 thousands of lines of code (KLOC), 240 were
between 20 and 100 KLOC, 271 were between 100 and 500
KLOC, 82 were between 500 KLOC and 1 million lines of
code (MLOC, and 93 were greater than 1 MLOC.
These organizations are located primarily in the United
States, Europe, and India. Since there is no rigorously
developed population description of the global trove of
business applications, it is impossible to assess the
generalizability of these results. Although these results may
not characterize the global population of IT business
applications, they do emerge from what is believed to be the
largest sample of applications ever to be statically analyzed
and measured against internal quality characteristics across
different technologies. Figure 1 presents the number of
applications by industry sector and by language/technology
type. Because of the selection process for submitting
applications to deep structural analysis, we believe this
sample is biased toward business critical applications.
These business applications were analyzed using CAST’s
Application Intelligence Platform (AIP) which performs a
static analysis of an entire application using over 1200 rules
to detect violations of good architectural and coding practice.
These rules have been drawn from an exhaustive study of
software engineering texts, online discussion groups focused
on application best practices and defects, and customer
experience drawn from defect logs and application architects.
Examples of violations in the area of security would include
SQL injection, cross-site scripting, buffer overflows, and
similar problems from the Common Weakness Enumeration
(cwe.mitre.org).
The AIP begins by parsing an entire application at build
time to develop a representation of the elements from which
the application is built, its data-flows. This analysis is
normally performed at during the build in order to analyze
the source code at the application level across various
language and technical platforms. The AIP includes parsers
for the 28 languages such as Java, JavaEE, .NET, Visual
Basic, JSP, PHP, C, C++, C#, ABAP, XML, Javascript,
SQL, COBOL, and a universal analyzer that provides and
80% parse for languages lacking a dedicated parser.
Once the code is parsed, AIP looks for violations of its
architectural and coding rules and identifies the number of
violations versus the number of opportunities for violations
for each rule. The results are aggregated to the application
level where each violation is weighted by its severity and
summed into both a specific measure for a quality
characteristic (called Health Factors) such as Changeability
or Security, and a Total Quality Index that aggregates scores
for violations across all Health Factors. AIP provides a
series of management reports and a portal that guide
developers to locations in the source code for specific
violations that need remediation. More information about
AIP can be obtained at www.castsoftware.com.
I
ndustr
y
T
otal
Language
Total
Ener
gy
&
U
tility
40
C
14
Financial
150
C++
9
I
nsur
anc
e
70
.NET
51
IT
C
onsulting
109
J2EE
339
M
anufac
tur
ing
94
Visual Basic
14
O
ther
30
ABAP
59
G
o
v
er
nmen
t
78
Oracle Forms
39
R
etail
32
Oracle ERP
12
T
echnology
21
COBOL
80
T
elec
om
121
Mixed & Other
128
T
otal
745
Total
745
Figure 1. Distribution of applications in the Appmarq sample by
industry segment and language/technology.
The application health factors in AIP were selected after
reviewing ISO/IEC 9126 [5]. However, since the quality
characteristics in this standard have not been defined down
to a level that can be computed from the source code, some
health factors names differ from 9126 based on the content
analyzed and the meaningfulness of the names to users of the
technology. The five Health Factors reported in this study are
as follows
Robustnessthe stability of an application and the
ease of recovery from failures.
Performance Efficiencythe responsiveness of the
application.
Securityan application’s ability to prevent
unwanted intrusions.
Transferabilitythe ease with which a new team
can understand the application and quickly become
productive working on it.
Changeabilityan applications’ ability to become
quickly and easily modified.
In order to provide more standardization to computable
measures of internal quality, the Consortium for IT Software
Quality [6] sponsored by the Software Engineering Institute
at Carnegie Mellon University and the Object Management
Group is developing standard definitions for automatable
software quality metrics. CISQ intends to make these
metrics as consistent as possible with the emerging ISO
25000 [7] series standards which will replace ISO 9126.
The number of rules evaluated for each Health Factor
ranged between 176 and 506. Scores for each of these
internal quality characteristics are aggregated from the
component to the application level and reported on a scale of
1 (high risk) to 4 (low risk), using an algorithm that weights
the severity of each violation and its relevance to each
individual Health Factor.
III. THE TECHNICAL DEBT METAPHOR
Ward Cunningham initiated the Technical Debt metaphor
in 1992 by referring to violations of good architectural and
coding practice as ‘debt’. According to Cunningham,
“Shipping first time code is like going into debt. A little debt
speeds development so long as it is paid back promptly with
a rewrite…The danger occurs when the debt is not repaid.
Every minute spent on not-quite-right code counts as interest
on that debt. Entire engineering organizations can be brought
to a stand-still under the debt load of an unconsolidated
implementation.”
However, the fundamental problem underlying Technical
Debt was formulated in the 1970s by Meir Lehman [8] who
posited in one of his laws of software evolution that as a
“system evolves its complexity increases unless work is done
to maintain or reduce it.” This complexity degrades the
performance of business applications, increases their
likelihood of failure, and multiplies the cost of owning them.
Technical Debt is created when developers write
software that violates good architectural or coding practices,
creating structural flaws in the code. Although Cunningham
was only referring to structural problems that result from
conscious design tradeoffs or coding shortcuts to get
functionality running quickly, we embrace a broader
approach to include all structural problems that an IT
organization prioritizes as ‘must fix’. According to industry
luminary Steve McConnell [9], sometimes Technical Debt is
an unintentional consequence of inexperience or incorrect
assumptions, while in other cases it is intentional, as in
Cunningham’s definition, in order to get new functionality
running quickly. In either case, the development team
knows, or ultimately learns, that it has released software with
structural flaws that must be fixed or the cost and risk of the
application will grow unacceptably.
Technical Debt must be distinguished from defects or
failures. Failures during test or operation may be symptoms
of Technical Debt, but most of the structural flaws creating
Technical Debt have not caused test or operational failures.
Some may never cause test or operational failures but instead
make an application less efficient, less scalable, more
difficult to enhance, or more penetrable by hackers. In
essence, Technical Debt emerges from poor structural
quality and affects a business both as IT cost and business
risk.
Choosing ‘debt’ as a metaphor engages a set of financial
concepts that help executives think about software quality in
business terms. In this section we will define the concepts
required to apply the full Technical Debt metaphor so that
each factor can be measured and used in analyzing the
structural quality of applications financially.
Technical Debtthe future costs attributable to known
structural flaws in production code that need to be fixed, a
cost that includes both principle and interest. A structural
flaw in production code is only included in Technical Debt
calculations if those responsible for the application believe it
is a ‘must-fix’ problem. Technical Debt is a primary
component of the cost of application ownership.
Principalthe cost of remediating must-fix problems in
production code. At a minimum the principal is calculated
from the number of hours required to remediate must-fix
problems in production code, multiplied by the fully
burdened hourly cost of those involved in designing,
implementing, and testing these fixes.
Interestthe continuing costs primarily in IT attributable
to must-fix problems in production code. These continuing
costs can result from the excessive effort to modify
unnecessarily complex code, greater resource usage by
inefficient code, and similar costs.
Business riskthe potential costs to the business if must-
fix problems in production code cause damaging operational
events or other problems that reduce the value to be derived
from the application.
Liabilitythe costs to the business resulting from
operational problems caused by flaws in production code.
Such operational problems would include outages, incorrect
computations, lost productivity from performance
degradation, and security breaches. From a risk perspective,
flaws in the code include both must-fix problems included in
the calculation of Technical Debt as well as problems not
listed as must-fix because their risk was underestimated.
Riskthe potential liability to the business if a must-fix
problem in production code was to cause a liability-inducing
event. Risk will be expressed in terms of potential liability
to the business rather than the IT costs which are accounted
for under ‘interest’.
Opportunity cost—benefits that could have been
achieved had resources been committed to developing new
capability rather than being assigned to retire Technical
Debt. Opportunity cost represents the tradeoff that
application managers and executives must weigh when
deciding how much effort to devote to retiring Technical
Debt.
Structural quality problems give rise to Technical Debt,
which contains both principal and interest on the debt. The
cost to fix these structural problems constitutes the principal
of this debt. Structurally flawed code creates inefficiencies
such as greater maintenance effort or excessive computing
resources whose costs represent interest on the debt.
The structural problems underlying Technical Debt also
create business risks. When these risks translate into
negative operational events, they create a liability such as
lost such as outages and security breaches revenue from
Website sales or costly clean-up from a security breach.
Remediating Technical Debt requires schedule and effort
that could have been devoted to creating new business
functionality. Effort committed to retiring Technical Debt
represents an opportunity cost related to lost benefits that
might otherwise have been achieved by the business.
IV. ESTIMATING PRINCIPAL IN TECHNICAL DEBT
There is no exact measure of Technical Debt, since its
calculation must be based only on the structural flaws that
the organization intends to fix, some of which may not have
been detected yet. However, modern software analysis and
measurement technology allows us to estimate the amount of
principal in the Technical Debt of an application based on
actual counts of detectable structural problems. By
analyzing the structural quality of application, rating the
severity of each problem, and prioritizing the must-fix
problems, IT organizations can now estimate the amount of
Principal in the Technical Debt (hereafter called TD-
Principal) from empirical evidence.
Within this context, TD-Principal is a function of three
variablesthe number of must-fix problems in an
application, the time required to fix each problem, and the
cost for fixing a problem. Each of these variables can be
measured or estimated and entered into a formula for
estimating TD-Principal. This formula produces results that
do not include interest on the debt, liability, or any of the
other costs associated with Technical Debt other than the
principal.
The number of must-fix structural problems in an
application can be measured through the static analysis of an
application’s source code. However, with limited
application budgets, IT organizations will never fix all the
problems in an application. Therefore each of the structural
problems detected through static analysis must be weighted
by its potential severity. If severity scores are grouped into
categoriesfor instance; low, medium, and highthen IT
management can determine what percentage of problems in
each category are must-fix.
The time to fix a structural quality problem includes the
time to analyze the problem, the time to understand the code
and determine a correction, the time to evaluate potential
side-effects, the time to implement and test the correction,
and the time to release the correction into operations.
This variable can be set to the average burdened rate for
the developers assigned to fix structural problems. Although
burdened hourly rates may vary by experience and location,
we have found that a rate of between $70 and $80 per hour
reflects the average costs for many IT organizations. If an
organization’s labor rates vary widely, this variable can also
be measured as a frequency distribution of costs.
Although the data presented here were calculated using
the TD-Principal formula as parameterized above, different
assumptions about the parameters might be more appropriate
for the specific conditions within different organizations.
We encourage organizations to adjust the parameters in this
formula to best fit their objectives, experiences, and cost.
V. INITIAL RESULTS IN MEASURING TD-PRINCIPAL
In an initial exploration of measuring TD-Principal, we
assumed that an IT organization would fix 50% of the high
severity problems, 25% of the medium severity problems,
and no more than 10% of the low severity problems. To
keep the estimate of TD-Principal conservative we assumed
that defects would be fixed in 1 hour. However, this number
appears to only describe the repairing of simple violations in
single components. We set the labor rate to an average of
$75 per hour.
However, the parameters in this formula can be easily
adjusted to better reflect the experience and objectives of a
specific organization. For instance, an organization can set
the parameters for the percentage of problems it will fix in
each severity category according to its own maintenance and
structural quality objectives. In future work we anticipate
changing the parameters to 0% LSV, 50% MSV, and 100%
HSV since field discussions with IT organizations suggest
that they are primarily interested in fixing high priority
defects and some medium priority defects.
In this initial exploration we made the very conservative
assumption that all problems would be fixed within one
hour. This parameter value is extremely conservative and
was chosen to make our initial TD-Principal results
conservative. However, preliminary data from operational
environments show wide variation in correction times based
on the complexities of the structural problems involved.
Based on distributions from the limited data available from
operational environments, we anticipate using a
parameterized Weibell distribution to represent fix times in
future calculations.
The initial formula and parameterization we used for
calculating TD-Principal in this paper are as follows.
TD-Principal =
(Σ high severity violations) x .5) 1 hr.) x 75$) +
(Σ medium severity violations) x .25) 1 hr.) x 75$) +
(Σ low severity violations) x .1) 1 hr.) x 75$)
To develop and initial estimate the average TD-Principal
across the Appmarq sample, we first calculated TD-Principal
individually for each of the 745 applications using the
formula presented above. These individual application
scores were then averaged across the Appmarq sample to
produce an average TD-Principal of $3.61 per line of code.
Based on this formulation, a typical application accrues
$361,000 of TD-Principal for each 100,000 lines of code,
and applications of 300,000 or more lines carry more than $1
million of TD-Principal ($1,083,000). This is an estimate of
the cost to repair only the must-fix problems and is
conservative based on the initial parameter values chosen.
Had we used the parameters we anticipate using in the
future, the TD-Principal per line of code would have been
closer to $10 which is closer to estimates provided by some
analysts.
Although IT organizations could estimate their total TD-
Principal by multiplying an estimate of the size of their
application code base by $3.61, it would be more accurate to
analyze it by technology and language type. Significant
differences were found in TD-Principal estimates between
languages, with the lowest figure being reported for ABAP
($0.43) and the highest for Java-EE ($5.42). C++ ($4.33)
and Oracle Forms also had above average TD-Principal
estimates. Since these figures are based on very conservative
parameters, the actual TD-Principal in most applications is
likely to be significantly higher. Thus, these estimates
should be treated as lower bounds.
The greatest variability in TD-Principal results occurred
for C++ (s.d.=$7.02) and Oracle Forms (s.d.=$6.70). These
results demonstrate that even for applications developed
using the same language and technology, TD-Principal
results can vary widely. Consequently, in order to be used
effectively for management decisions, TD-Principal should
be measured and analyzed individually for each application,
or at a minimum category of applications, rather than using
an average value across all applications regardless of the
language or technology platform on which the application
was developed.
These figures could change if the mix of application
characteristics in each technology/language category change
in the Appmarq repository as the sample of applications
grows. Consequently we urge caution in interpreting these
figures as industry benchmarks, especially since they are
based on very conservative assumptions. Nevertheless they
provide a starting point for estimating TD-Principal, and one
that can be adjusted based on different assumptions about the
parameters in the calculation used in the formula in Figure 2.
VI. COMPONENTS OF TECHNICAL DEBT
Although TD-Principal can be measured as violations of
good structural quality, these violations consist of different
types of threats to the business or costs to IT. In order to use
TD-Principal effectively in making decisions about how
much resource to allocate to eliminating these violations,
management needs to distinguish among its quality priorities
and then prioritize the importance of eliminating TD-
Principal in each area. Our data allow us to measure the TD-
Principal associated with each of the five Health Factors
since they represent different types of costs to IT or risks to
the business.
The amount of TD-Principal in an application associated
with each of these Health Factors differs. Seventy percent of
the TD-Principal measured in this sample was contained in
the IT cost related Health Factors of Changeability (30%)
and Transferability (40%). Thirty percent of the TD-
Principal was associated with the business risk Health
Factors of Robustness (18%), Security (7%), and
Performance Efficiency (5%). We cannot determine from
the data whether IT organizations are spending more time
eliminating TD-Principal related to business risk or whether
TD-Principal is disproportionately created in IT cost-related
factors. Nevertheless, a single high severity violation related
to business risk can be devastating if it eventually causes an
operational problem.
Although the comparative percentages of TD-Principal
remain generally consistent among Health Factors across
language/technology categories, some variation is apparent.
In particular the TD-Principal scores for Robustness appear
much higher for ABAP (42%), Oracle Forms (32%), and
Visual Basic (23%).
These results indicate that the analysis and measurement
of TD-Principal can guide critical management decisions
about how to allocate resources for reducing business risk
and IT cost. Trying to make decisions about retiring TD-
Principal at a global level is overwhelming and it is difficult
to visualize what the expected payoff will be. However,
when TD-Principal can be analyzed into its constituent parts,
management can set specific reduction targets based on
strategic quality priorities with an expectation of the benefit
to be achieved.
For instance, removing the highest severity violations
affecting Robustness reduces the risk of catastrophic
operational crashes, thus improving IT’s ability to achieve
availability targets. As IT collects more data, management
will be able to develop a quantitative understanding of how
much TD-Principal related to Robustness it can sustain in an
application without risking its availability goals. Such
reasoning can be applied to decisions regarding the amounts
to invest in reducing the TD-Principal associated with each
Health Factor. When TD-Principal is measured and
estimated, it will become a standard referent for managing
applications and portfolios. Further exploration of these and
related results can be found in the CRASH Report [10].
REFERENCES
[1] Spinellis, D. (2006). Code Quality: The Open Source Perspective.
Boston: Addison-Wesley.
[2] Curtis, B., Sheppard, S.B., Milliman, P., Borst, A., & Love, T.
(1979a). Measuring the psychological complexity of software
maintenance tasks with the Halstead and McCabe metrics. IEEE
Transactions on Software Engineering, 5 (2), 96-104.
[3] Curtis, B., Sheppard, S.B., and Milliman, P. (1979b). Third time
charm: Stronger prediction of programmer performance by software
complexity metrics. Proceedings of the 4th International Conference
on Software Engineering. Washington, DC: IEEE Computer Society,
356-360.
[4] Boehm, B.W., Brown, J.R., & Lipow, M. (1976). Quantitative
evaluation of software quality. Proceedings of the 2nd International
Conference on Software Engineering. Los Alamitos, CA: IEEE
Computer Society Press, 592-605.
[5] ISO/IEC JTC 1, SC 7 (2001). ISO 9126. Geneva: ISO.
[6] Consortium for IT Software Quality (2010). www.it-cisq.org.
[7] ISO/IEC JTC1/SC7 (2010). ISO 25000. Montreal: École de
technologie supérieure Department of Software and IT Engineering,
1100 Notre Dame Ouest, Montréal, Québec Canada H3C 1K3.
[8] Lehman, M. M. (1980). Programs, life cycles, and laws of software
evolution. Proceedings of the IEEE, 68 (9), 10601076.
[9] McConnell, S. (2007). Technical Debt.
http://blogs.construx.com/blogs/stevemcc/archive/2007/11/01/technic
al-debt-2.aspx.
[10] Sappidi, J., Curtis, B., & Szynkarski, A. (2010). CRASH Report:
CAST Report on Application Software Health2011/2012. New
York: CAST Software.
... Likely, due to being a metaphor, the term td was defined differently by various authors since its initial occurrence. Some authors define it solely in terms of source code [14,15,39], e.g., as "cost of the effort required to fix problems that remain in the code when an application is released to operation." [39] Others, consider td more broadly than to be just caused by properties of code [7,16], e.g., Avgeriu et. ...
... The td formula provided in AIP's documentation is slightly different from earlier versions reported in [14,15,18,39], where it was assumed that all issues take equally long to fix and that the distribution of the amount desired fixes was different. It is unclear why the current values are as they are and whether they will change again. ...
... The authors further argue that there "has never been a single standard, and so we set out to create one." 15 . ...
Preprint
Full-text available
The technical state of software, i.e., its technical debt (TD) and maintainability are of increasing interest as ever more software is developed and deployed. Since td and maintainability are neither uniformly defined, not easy to understand, nor directly measurable, practitioners are likely to apply readily available tools to assess TD or maintainability and they may rely on the reported results without properly understanding what they embody. In this paper, we: a) methodically identify 11 readily available tools that measure TD or maintainability, b) present an in-depth investigation on how each of these tools measures and computes TD or maintainability, and c) compare these tools and their characteristics. We find that contemporary tools focus mainly on internal qualities of software, i.e., quality of source code, that they define and measure TD or maintainability in widely different ways, that most of the tools measure TD or maintainability opaquely, and that it is not obvious why the measure of one tool is more trustworthy or representative than the one of another.
... Even sometimes holding technical debt is better for software, so we need to find when it is beneficial to keep technical debt [61]. Fourthly, the relation between technical debt and economical debt is also an important concern that involves estimating the cost of different kinds of technical debts and determining when and by whom those will be paid [10,20]. ...
... Aside from that, most of the related papers identify technical debt for the necessity of it in their work. Solely analyzing code is a widely used method for technical debt identification [7,62,13,26,67,20,43,56,12] and there are multiple tools for this which are quite popular, including SonarQube, Arcade, Arcan, Designite, Hotspot detector, etc. Analyzing only commits on the version controlling platform is another used strategy to identify technical debt [66,64]. Moreover, combining both code and commit analysis is also a popular strategy [48,9]. ...
Article
Full-text available
Poor design choices, bad coding practices, or the need to produce software quickly can stand behind technical debt. Unfortunately, manually identifying and managing technical debt gets more difficult as the software matures. Recent research offers various techniques to automate the process of detecting and managing technical debt to address these challenges. This manuscript presents a mapping study of the many aspects of technical debt that have been discovered in this field of study. This includes looking at the various forms of technical debt, as well as detection methods, the financial implications, and mitigation strategies. The findings and outcomes of this study are applicable to a wide range of software development life-cycle decisions.
... Details for: "MTD 2012: Proceedings of the Third International Workshop on Managing Technical Debt" ([64])Out of the 11 papers published as part of the third International Workshop on Managing Technical Debt we have found 3 that disclosed some measurement results, none of which might be reproducible.The papers that might have reproducibility problems:"Estimating the Size, Cost, and Types of Technical Debt" ([21]) uses the "Application Intelligence Platform" of "CAST", without disclosing the version used. ...
Article
Full-text available
Context: It is crucial to understand how reproducible the measurement results in the scientific publications are, as reproducibility is one of the cornerstones of engineering. Objective: The goal of this study is to investigate the scientific publications presented at the premier technical debt conferences by understanding how reproducible the reported findings are. Method: We conducted a systematic literature review of 135 unique papers published at the "International Workshop on Managing Technical Debt" and the "International Conference on Managing Technical Debt", the premier scientific conference series on technical debt. Results: Only 44 of the investigated 135 papers presented numerical evidence and only 5 papers listed the tools, the availability of the tools, and the version of the tools used. For the rest of the papers, additional information would have been needed for the potential reproducibility. One of the published papers even referred to a pornographic site as a source of a toolset for empirical research.
... Модель на основе бенчмаркинга Майра и других [16] тесно связана с их более ранней работой по оценке качества, ориентированной на бенчмаркинг. Также она рассчитывает стоимость устранения недостатков способом, аналогичным подходу CAST [14]. Релевантные метрики структуры кода в структуре для оценки интереса к ТД [21] были выбраны таким образом, чтобы связать сопровождаемость и ТД в [15]. ...
... • Code related: Some authors associate various code related issues to ATDIs [44,46,47,48], e.g., lack of code documentation, high code complexity and code duplication. ...
Thesis
Full-text available
Architectural technical debt (ATD) in a software-intensive system is the sum of all design choices that may have been suitable or even optimal at the time they were made, but which today are significantly impending progress: structure, framework, technology, languages, etc. Unlike code-level technical debt which can be readily detected by static analysers, and can often be refactored with minimal or only incremental efforts, architectural debt is hard to detect, and its remediation rather wide-ranging, daunting, and often avoided. The objective of this thesis is to develop a better understanding of architectural technical debt, and determine what strategies can be used to identify and manage it. In order to do so, we adopt a wide range of research techniques, including literature reviews, case studies, interviews with practitioners, and grounded theory. The result of our investigation, deeply grounded in empirical data, advances the field not only by providing novel insights into ATD related phenomena, but also by presenting approaches to pro-actively identify ATD instances, leading to its eventual management and resolution.
... It proposes the SQALE [12] model for estimating technical debt by considering the remediation cost of each issue and the ratio between the remediation cost of the issues and the cost of starting over. CAST [4] 3 estimates technical debt as the cost of remediating violations of good architectural or coding practices in production code based on the detection of structural problems. Structure101 4 index shows a Structural over-Complexity view to estimate the proportion of the system that is affected by architectural issues. ...
Article
Context Requirements Technical Debt are related to the distance between the ideal value of the specification and the actual implementation of the system, which are consequences of strategic decisions for immediate gains, or unintended changes in context. To ensure the evolution of the software, it is necessary to to manage TD. Identification and measurement are the first two stages of the management process; however, they are poorly explored in academic research in requirements engineering. Objective We aimed to investigating which evidence helps to strengthen the TD requirements management process, including identification and measurement. Method We conducted a Systematic Literature Review through manual and automatic searches considering 7499 studies from 2010 to 2020, and including 66 primary studies. Results We identified some causes related to Technical Debt requirements, existing strategies to help in the identification and measurement, and metrics to support the measurement stage. Conclusion The studies on Requirements Technical Debt are still preliminary, especially regarding management software. Yet, however, the interpersonal aspects that prove difficult in the implementation of such activities are not sufficiently addressed. Finally, the provision of metrics to help measure technical debt is part of the contribution of this search, providing insights into the application in its requirements context.
Article
Technical debt is a sub-optimal state of development in projects. In particular, the type of technical debt incurred by developers themselves (e.g., comments that mean the implementation is imperfect and should be replaced with another implementation) is called self-admitted technical debt (SATD). In theory, technical debt should not be left for a long period because it accumulates more cost over time, making it more difficult to process. Accordingly, developers have traditionally conducted code reviews to find technical debt. In fact, we observe that many SATD comments are often introduced during modern code reviews (MCR) that are light-weight reviews with web applications. However, it is uncertain about the nature of SATD comments that are introduced in the review process: impact, frequency, characteristics, and triggers. Herein, this study empirically examines the relationship between SATD and MCR. Our case study of 156,372 review records from the Qt and OpenStack systems shows that (i) review records involving SATD are about 6%–7% less likely to be accepted by reviews than those without SATD; (ii) review records involving SATD tend to require two to three more revisions compared with those without SATD; (iii) 28–48% of SATD comments are introduced during code reviews; (iv) SATD during reviews works for communicating between authors and reviewers; and (v) 20% of the SATD comments are introduced due to reviewers’ requests.
Preprint
Full-text available
This special issue interrogates the meaning and impacts of "tech ethics": the embedding of ethics into digital technology research, development, use, and governance. In response to concerns about the social harms associated with digital technologies, many individuals and institutions have articulated the need for a greater emphasis on ethics in digital technology. Yet as more groups embrace the concept of ethics, critical discourses have emerged questioning whose ethics are being centered, whether "ethics" is the appropriate frame for improving technology, and what it means to develop "ethical" technology in practice. This interdisciplinary issue takes up these questions, interrogating the relationships among ethics, technology, and society in action. This special issue engages with the normative and contested notions of ethics itself, how ethics has been integrated with technology across domains, and potential paths forward to support more just and egalitarian technology. Rather than starting from philosophical theories, the authors in this issue orient their articles around the real-world discourses and impacts of tech ethics--i.e., tech ethics in action.
Article
Full-text available
Three software complexity measures (Halstead's E, McCabe's u(G), and the length as measured by number of statements) were compared to programmer performance on two software maintenance tasks. In an experiment on understanding, length and u(G) correlated with the percent of statements correctly recalled. In an experiment on modification, most significant correlations were obtained with metrics computed on modified rather than unmodified code. All three metrics correlated with both the accuracy of the modification and the time to completion. Relationships in both experiments occurred primarily in unstructured rather than structured code, and in code with no comments. The metrics were also most predictive of performance for less experienced programmers. Thus, these metrics appear to assess psychological complexity primarily where programming practices do not provide assistance in understanding the code.
Article
Page 26: How can I avoid off-by-one errors? Page 143: Are Trojan Horse attacks for real? Page 158: Where should I look when my application can't handle its workload? Page 256: How can I detect memory leaks? Page 309: How do I target my application to international markets? Page 394: How should I name my code's identifiers? Page 441: How can I find and improve the code coverage of my tests? Diomidis Spinellis' first book, Code Reading, showed programmers how to understand and modify key functional properties of software. Code Quality focuses on non-functional properties, demonstrating how to meet such critical requirements as reliability, security, portability, and maintainability, as well as efficiency in time and space.Spinellis draws on hundreds of examples from open source projects--such as the Apache web and application servers, the BSD Unix systems, and the HSQLDB Java database--to illustrate concepts and techniques that every professional software developer will be able to appreciate and apply immediately.Complete files for the open source code illustrated in this book are available on the Code Reading CD-ROM and online at: http://www.spinellis.gr/codequality/
Article
This experiment is the third in a series investigating characteristics of software which are related to its psychological complexity. A major focus of this research has been to validate the use of software complexity metrics for predicting programmer performance. In this experiment we improved experimental procedures which produced only modest results in the previous two studies. The experimental task required 54 experienced Fortran programmers to locate a single bug in each of three programs. Performance was measured by the time to locate and successfully correct the bug. Much stronger results were obtained than in earlier studies. Halstead's E proved to be the best predictor of performance, followed by McCabe's v (G) and the number of lines of code.
Conference Paper
The study reported in this paper establishes a conceptual framework and some key initial results in the analysis of the characteristics of software quality. Its main results and conclusions are: • Explicit attention to characteristics of software quality can lead to significant savings in software life-cycle costs. • The current software state-of-the-art imposes specific limitations on our ability to automatically and quantitatively evaluate the quality of software. • A definitive hierarchy of well-defined, well-differentiated characteristics of software quality is developed. Its higher-level structure reflects the actual uses to which software quality evaluation would be put; its lower-level characteristics are closely correlated with actual software metric evaluations which can be performed. • A large number of software quality-evaluation metrics have been defined, classified, and evaluated with respect to their potential benefits, quantifiability, and ease of automation. •Particular software life-cycle activities have been identified which have significant leverage on software quality. Most importantly, we believe that the study reported in this paper provides for the first time a clear, well-defined framework for assessing the often slippery issues associated with software quality, via the consistent and mutually supportive sets of definitions, distinctions, guidelines, and experiences cited. This framework is certainly not complete, but it has been brought to a point sufficient to serve as a viable basis for future refinements and extensions.
Article
Three software complexity measures (M. H. Halstead's E, T. J. McCabe's v (G), and the length as measured by number of statements) were compared to programmer performance on two software maintenance tasks. In an experiment on understanding, length and v(G) correlated with the percent of statements correctly recalled. In an experiment on modification, most significant correlations were obtained with metrics computed on modified rather than unmodified code. All three metrics correlated with both the accuracy of the modification and the time to completion. Relationships in both experiments occurred primarily in unstructured rather than structured code, and in code with no comments. The metrics were also most predictive of performance for less experienced programmers. These metrics appear to assess psychological complexity primarily where programming practices do not provide assistance in understanding the code.
Article
By classifying programs according to their relationship to the environment in which they are executed, the paper identifies the sources of evolutionary pressure on computer applications and programs and shows why this results in a process of never ending maintenance activity. The resultant life cycle processes are then briefly discussed. The paper then introduces laws of Program Evolution that have been formulated following quantitative studies of the evolution of a number of different systems. Finally an example is provided of the application of Evolution Dynamics models to program release planning.
École de technologie supérieure – Department of Software and IT Engineering
  • Montreal
Montreal: École de technologie supérieure – Department of Software and IT Engineering, 1100 Notre Dame Ouest, Montréal, Québec Canada H3C 1K3.