Kim Herzig

Kim Herzig
Microsoft · Tools for Software Engineers

PhD of Computer Science

About

36
Publications
7,707
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,561
Citations
Introduction
In general, my research is concerned with empirical software engineering and in particular mining software repositories and software testing processes. Currently, I am assessing the current test and verification infrastructures and process at Microsoft. The goal is to work with individual product groups to propose optimizations to make Microsoft's testing and verification process more effective and efficient without sacrificing code quality.
Additional affiliations
February 2013 - January 2015
Microsoft
Position
  • Post Doc Researcher
November 2007 - February 2013
Universität des Saarlandes
Position
  • PhD Student

Publications

Publications (36)
Chapter
Over the years, it has become common practice in empirical software engineering to mine data from version archives and bug databases to learn where bugs have been fixed in the past, or to build prediction models to find error-prone code in the future. However, most of these approach rely on strong assumptions that need to be verified to ensure that...
Chapter
Software is present in nearly every aspect of our daily lives and also dominates large parts of the high-tech consumer market. Consumers love new features, and new features are what makes them buy software products, while features like reliability, security, and privacy are assumed. To respond to the consumer market demand, many software producers...
Conference Paper
Vulnerability prediction models (VPM) are believed to hold promise for providing software engineers guidance on where to prioritize precious verification resources to search for vulnerabilities. However, while Microsoft product teams have adopted defect prediction models, they have not adopted vulnerability prediction models (VPMs). The goal of thi...
Article
When interacting with source control management system, developers often commit unrelated or loosely related code changes in a single transaction. When analyzing version histories, such tangled changes will make all changes to all modules appear related, possibly compromising the resulting analyses through noise and bias. In an investigation of fiv...
Chapter
Although software systems control many aspects of our daily life world, no system is perfect. Many of our day-to-day experiences with computer programs are related to software bugs. Although software bugs are very unpopular, empirical software engineers and software repository analysts rely on bugs or at least on those bugs that get reported to iss...
Conference Paper
Software quality is one of the most pressing concerns for nearly all software developing companies. At the same time, software companies also seek to shorten their release cycles to meet market demands while maintaining their product quality. Identifying problematic code areas becomes more and more important. Defect prediction models became popular...
Conference Paper
Context: Software testing is a crucial step in most software development processes. Testing software is a key component to manage and assess the risk of shipping quality products to customers. But testing is also an expensive process and changes to the system need to be tested thoroughly which may take time. Thus, the quality of a software product...
Conference Paper
When analyzing version histories, researchers traditionally focused on single events: e.g. the change that causes a bug, the fix that resolves an issue. Sometimes however, there are indirect effects that count: Changing a module may lead to plenty of follow-up modifications in other places, making the initial change having an impact on those later...
Conference Paper
Full-text available
In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified—that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug predic...
Conference Paper
When interacting with version control systems, developers often commit unrelated or loosely related code changes in a single transaction. When analyzing the version history, such tangled changes will make all changes to all modules appear related, possibly compromising the resulting analyses through noise and bias. In an investigation of five open-...
Conference Paper
Full-text available
Fuzz testing is an automated technique providing random data as input to a software system in the hope to expose a vulnerability. In order to be effective, the fuzzed input must be common enough to pass elementary consistency checks; a JavaScript interpreter, for instance, would only accept a semantically valid program. On the other hand, the fuzze...
Thesis
Developers change source code to add new functionality, fix bugs, or refactor their code. Many of these changes have immediate impact on quality or stability. However, some impact of changes may become evident only in the long term. This thesis makes use of change genealogy dependency graphs modeling dependencies between code changes capturing how...
Conference Paper
Full-text available
Software reliability is heavily impacted by software changes. ow do these changes relate to each other? By analyzing the impacted method definitions and usages, we determine dependencies between changes, resulting in a change genealogy that captures how earlier changes enable and cause later ones. Model checking this genealogy reveals temporal proc...
Conference Paper
Full-text available
Several defect prediction models have been proposed to identify which entities in a software system are likely to have defects before its release. This paper presents a replication of one such study conducted by Zimmermann and Nagappan on Windows Server 2003 where the authors leveraged dependency relationships between software entities captured usi...
Conference Paper
Full-text available
Changing source code in large software systems is complex and requires a good understanding of dependencies between software components. Modification to components with little regard to dependencies may have an adverse impact on the quality of the latter, i.e., increase their risk to fail. We conduct an empirical study to understand the relationshi...
Conference Paper
Full-text available
In software development, every change induces a risk. What happens if code changes again and again in some period of time? In an empirical study on Windows Vista, we found that the features of such change bursts have the highest predictive power for defect-prone components. With precision and recall values well above 90%, change bursts significantl...
Conference Paper
Full-text available
Developers change source code to add new functionality, fix bugs, or refactor their code. Many of these changes have immediate impact on quality or stability. However, some impact of changes may become evident only in the long term. The goal of this thesis is to explore the long-term impact of changes by detecting dependencies between code changes...
Conference Paper
By integrating various development and collaboration tools into one single platform, the Jazz environment offers several opportunities for software repository miners. In particular, Jazz offers full traceability from the initial requirements via work packages and work assignments to the final changes and tests; all these features can be easily acce...
Conference Paper
Full-text available
Which components of a large software system are the most defect-prone? In a study on a large SAP Java system, we evaluated and compared a number of defect predictors, based on code features such as complexity metrics, static error detectors, change frequency, or component imports, thus replicating a number of earlier case studies in an industrial c...
Article
Full-text available
Given a large body of code, how do we know where to focus our quality assurance effort? By mining the software’s defect history, we can automatically learn which code features correlated with defects in the past—and leverage these correlations for new predictions: “In the past, high inheritance depth was an indicator of a high number of defects. Si...
Article
Full-text available
Changing source code in large software systems is complex and requires a good understanding of dependencies between software components. Modification to components with little regard to de-pendencies may increase have an adverse impact on the quality of the latter, i.e., increase their risk to fail. We conduct an empirical study to understand the r...
Article
Full-text available
When developers commit software changes to a version control system, they often commit unrelated changes in a single transaction—simply because, while, say, fixing a bug in module A, they also came across a typo in module B, and updated a deprecated call in module C. When analyzing such archives later, the changes to A, B, and C are treated as bein...

Questions

Question (1)
Question
I’m on the program committee of the “Software Engineering in Practice” track at the ICSE conference next year. Usually submissions are longer experience reports or case studies (up to ten pages), but we are also looking for panels or -- new this year -- for talk proposals. Talk proposals are fairly lightweight (max 150+500 words + keywords + bios). All submissions are due by October 24.
The ICSE conference is in Florence, Italy, which I’ve heard is very nice in May. You can find more information here: http://2015.icse-conferences.org/call-dates/call-for-contributions/seip

Network

Cited By