Chapter

How Do Quantifiers Affect the Quality of Requirements?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

[Context] Requirements quality can have a substantial impact on the effectiveness and efficiency of using requirements artifacts in a development process. Quantifiers such as “at least”, “all”, or “exactly” are common language constructs used to express requirements. Quantifiers can be formulated by affirmative phrases (“At least”) or negative phrases (“Not less than”). [Problem] It is long assumed that negation in quantification negatively affects the readability of requirements, however, empirical research on these topics remains sparse. [Principal Idea] In a web-based experiment with 51 participants, we compare the impact of negations and quantifiers on readability in terms of reading effort, reading error rate and perceived reading difficulty of requirements. [Results] For 5 out of 9 quantifiers, our participants performed better on the affirmative phrase compared to the negative phrase. Only for one quantifier, the negative phrase was more effective. [Contribution] This research focuses on creating an empirical understanding of the effect of language in Requirements Engineering. It furthermore provides concrete advice on how to phrase requirements.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The formal representation of requirements is supposed to overcome some of the deficiencies of natural language requirements, especially lack of precision and non-machine readability [1][2][3][4]. However, if requirements are formulated in a formal logic such as temporal logic, they are accessible Elisabeth to only a restricted group of requirement engineers. ...
... (2) We performed a replication of the first study with a notably larger number of participants to increase the confidence in the obtained results and the assumptions we made regarding the understanding of trivial cases. (3) We performed an empirical study to investigate the effect of the suggested pattern modifications on the intuitive understanding of our pattern language. ...
... For patterns whose meaning is inverse to an already added pattern (e. g. it is always the case that R holds and it is never the case that R holds), we only included the positive formulated pattern in the survey. We do not assume that negative and positive formulations behave similarly and are aware that different formulations may lead to vastly different error counts, as shown by Winter et al. [3]. However, we assume that the use of the negative formulation does not provide any additional insights into the pattern formulation in general. ...
Article
Full-text available
Formal pattern languages are used in industry to communicate and analyse requirements, as they are said to be both machine-readable and intuitively understandable for humans. The questions arise to what extent this intuitive understanding of a pattern language is in agreement with its formal semantics and whether this understanding can be increased systematically. We present two consecutive empirical experiments to address these questions. The formal semantics serves as an objective judge on the intuitive understanding. Our experiments confirm the practical usefulness of HanforPL insofar the intuition matches the formal semantics in most practically relevant cases. They also reveal a number of edge cases where even a prior exposure to formal logic is not a guarantee for correct understanding. We present and validate systematic adjustments to the patterns, leading to several large increases in understandability but come at the cost of new, but less impactful ambiguities. We demonstrate how an inquiry on the alignment of the intuitive and formal semantics of a pattern language can help to understand and improve the language. While results regarding the understandability of HanforPL are favourable in commonly used cases, there is potential for improvement. The systematic adaption of patterns shows that small modifications may have large effects on the alignment of formal and intuitive semantics, and that modification must be considered with caution in the context of the respective pattern to avoid unintentionally adding new ambiguities. This article is an extension of our published REFSQ paper.
... To this end, following the guidelines of [85], the quantifier definitions are grouped into three distinct syntax categories, namely <affirmative>, <negative> and <closed-interval>. The current set of templates does not use all possible quantifiers, but the syntax structure clearly foresees the need for more templates to address evolving specification needs. ...
Article
Full-text available
System requirements specify how a system meets stakeholder needs. They are a partial definition of the system under design in natural language that may be restricted in syntax terms. Any natural language specification inevitably lacks a unique interpretation and includes underspecified terms and inconsistencies. If the requirements are not validated early in the system development cycle and refined, as needed, specification flaws may cause costly cycles of corrections in design, implementation and testing. However, validation should be based on a consistent interpretation with respect to a rigorously defined semantic context of the domain of the system. We propose a specification approach that, while sufficiently expressive, it restricts the requirements definition to terms from an ontology with precisely defined concepts and semantic relationships in the domain of the system under design. This enables a series of semantic analyses, which guide the engineer towards improving the requirement specification as well as eliciting tacit knowledge. The problems addressed are prerequisites to enable the derivation of verifiable specifications, which is of fundamental importance for the design of critical embedded systems. We present the results from a case study of modest size from the space system domain, as well as an evaluation of our approach from the user's point of view. The requirement types that have been covered demonstrate the applicability of the approach in an industrial context, although the effectiveness of the analysis depends on pre-existing domain ontologies.
... Berry and Kamsties [2] show that indefinite quantifiers can lead to misunderstandings. Winter et al. [29] show that negative phrasing of quantifiers is more ambiguous than affirmative phrasing. Femmer et al. [12] reveals that the use of passive voice leads to ambiguity in requirements. ...
Chapter
Full-text available
Context: Conditional statements like “If A and B then C” are core elements for describing software requirements. However, there are many ways to express such conditionals in natural language and also many ways how they can be interpreted. We hypothesize that conditional statements in requirements are a source of ambiguity, potentially affecting downstream activities such as test case generation negatively. Objective: Our goal is to understand how specific conditionals are interpreted by readers who work with requirements. Method: We conduct a descriptive survey with 104 RE practitioners and ask how they interpret 12 different conditional clauses. We map their interpretations to logical formulas written in Propositional (Temporal) Logic and discuss the implications. Results: The conditionals in our tested requirements were interpreted ambiguously. We found that practitioners disagree on whether an antecedent is only sufficient or also necessary for the consequent. Interestingly, the disagreement persists even when the system behavior is known to the practitioners. We also found that certain cue phrases are associated with specific interpretations. Conclusion: Conditionals in requirements are a source of ambiguity and there is not just one way to interpret them formally. This affects any analysis that builds upon formalized requirements (e.g., inconsistency checking, test-case generation). Our results may also influence guidelines for writing requirements.
... Berry and Kamsties [2] show that indefinite quantifiers can lead to misunderstandings. Winter et al. [29] show that negative phrasing of quantifiers is more ambiguous than affirmative phrasing. Femmer et al. [12] reveals that the use of passive voice leads to ambiguity in requirements. ...
Preprint
Full-text available
Context: Conditional statements like "If A and B then C" are core elements for describing software requirements. However, there are many ways to express such conditionals in natural language and also many ways how they can be interpreted. We hypothesize that conditional statements in requirements are a source of ambiguity, potentially affecting downstream activities such as test case generation negatively. Objective: Our goal is to understand how specific conditionals are interpreted by readers who work with requirements. Method: We conduct a descriptive survey with 104 RE practitioners and ask how they interpret 12 different conditional clauses. We map their interpretations to logical formulas written in Propositional (Temporal) Logic and discuss the implications. Results: The conditionals in our tested requirements were interpreted ambiguously. We found that practitioners disagree on whether an antecedent is only sufficient or also necessary for the consequent. Interestingly, the disagreement persists even when the system behavior is known to the practitioners. We also found that certain cue phrases are associated with specific interpretations. Conclusion: Conditionals in requirements are a source of ambiguity and there is not just one way to interpret them formally. This affects any analysis that builds upon formalized requirements (e.g., inconsistency checking, test-case generation). Our results may also influence guidelines for writing requirements.
Article
Full-text available
Various semi-formal syntax templates for natural language requirements foster to reduce ambiguity while preserving human readability. Existing studies on their effectiveness focus on individual notations only and do not allow to systematically investigate quality benefits. We strive for a comparative benchmark and evaluation of template systems to assist practitioners in selecting appropriate ones and enable researchers to work on pinpoint improvements and domain-specific adaptions. We conduct comparative experiments with five popular template systems—EARS, Adv-EARS, Boilerplates, MASTeR, and SPIDER. First, we compare a control group of free-text requirements and treatment groups of their variants following the different templates. Second, we compare MASTeR and EARS in user experiments for reading and writing. Third, we analyse all five meta-models’ formality and ontological expressiveness based on the Bunge-Wand-Weber reference ontology. The comparison of the requirement phrasings across seven relevant quality characteristics and a dataset of 1764 requirements indicates that, except SPIDER, all template systems have positive effects on all characteristics. In a user experiment with 43 participants, mostly students, we learned that templates are a method that requires substantial prior training and that profound domain knowledge and experience is necessary to understand and write requirements in general. The evaluation of templates systems’ meta-models suggests different levels of formality, modularity, and expressiveness. MASTeR and Boilerplates provide high numbers of variants to express requirements and achieve the best results with respect to completeness. Templates can generally improve various quality factors compared to free text. Although MASTeR leads the field, there is no conclusive favourite choice, as most effect sizes are relatively similar.
Chapter
[Context and motivation] Formal pattern languages with a restricted English grammar, such as the pattern language of Konrad and Cheng, give us the possibility to combine human intuition and the rigour of a machine. [Question/problem] The question arises to what extent the intuitive understanding of such a pattern language is in agreement with its formal semantics. [Principal ideas/results] We present an empirical study to address this question. The existence of a formal semantics allows us to use the machine as an objective judge to decide if the intuitive understanding is correct. The study confirms empirically the practical usefulness of HanforPL in that the intuitive understanding matches the formal semantics in most practically relevant cases. The study reveals that a number of phrases of interest represent critical edge cases where even a prior exposure to formal logic is not a guarantee for the correct intuitive understanding. [Contribution] We show how the alignment of formal and intuitive semantics can be investigated, and that this alignment can not simply be assumed. Nonetheless, results regarding the understandability of HanforPL are favourable with high understandability in commonly used patterns. The results of the study will be the basis of improvements in HanforPL.KeywordsPattern LanguagesFormal RequirementsIntuitive UnderstandingEmpirical Study
Article
Full-text available
Software quality in use (QinU) relates to human-software interactions when a software product is used in a particular context. Currently, QinU measurement models are bound to ineffective measurement formulation and many models are subjectively incoherent. This paper proposes a novel QinU framework (QinUF) to measure QinU competently consuming software reviews. The framework has three components: QinU prediction, polarity classification, and QinU scoring. The QinU prediction component computationally maps software review-sentences to its respective QinU characteristics (topics) of the ISO 25010 model based on a text similarity measure. The topic prediction problem is run as a text to text similarity; where the first text (test) is the actual unlabeled review-sentence and the second text is the set of selected features (keywords) from a benchmark dataset. The polarity classification component classifies each test sentence to its polarity orientation; the respective sentimental values are recorded. To score QinU, the sentimental values are grouped and summarized into their respective QinU topics. The QinUF evaluation over real-life scenarios showed that the QinUF automates software QinU measurement; therefore, users could compare and acquire software on the fly. The framework is consistent and superior to related compared works.
Article
Full-text available
A study of the time required to complete ambiguous sentences suggested that: even though Ss are unaware of the ambiguity while completing sentences, they take more time to complete ambiguous sentences than unambiguous ones: the degree of difficulty in completing ambiguous sentences is related to the linguistic level at which the ambiguity occurs: sentences containing two ambiguities are more difficult to complete than those containing only one, and when these two ambiguities occur at different linguistic levels, these sentences are harder to complete than when both occur within the same linguistic level: ambiguity may affect the grammaticality and relevance of completions; and may cause stuttering and laughter, even without awareness of the ambiguity. An attempt to fit these results to several theories of the processing of ambiguous sentences led us to the conclusion that ambiguity interferes with our understanding of a single meaning of a sentence, and that the degree of interference varies with the linguistic level at which the ambiguity occurs.
Article
The quality of requirements engineering artifacts is widely considered a success factor for software projects. Currently, the definition of high-quality or good RE artifacts is often provided through normative references, such as quality standards, textbooks, or generic guidelines. We see various problems of such normative references: (1) It is hard to ensure that the contained rules are complete, (2) the contained rules are not context-dependent, and (3) the standards lack precise reasoning why certain criteria are considered bad quality. To change this understanding, we postulate that creating an RE artifact is rarely an end in itself, but just a means to understand and reach the projects goals. Following this line of thought, the purpose of an RE artifact is to support the stakeholders in whatever activities they are performing in the project. This purpose must define high-quality RE artifacts. To express this view, we contribute an activity-based RE quality meta model and show applications of this paradigm. Lastly, we describe the impacts of this view onto research and practice.
Article
Context: Bad requirements quality can cause expensive consequences during the software development lifecycle, especially if iterations are long and feedback comes late. Objectives: We aim at a light-weight static requirements analysis approach that allows for rapid checks immediately when requirements are written down. Method: We transfer the concept of code smells to Requirements Engineering as Requirements Smells. To evaluate the benefits and limitations, we define Requirements Smells, realize our concepts for a smell detection in a prototype called Smella and apply Smella in a series of cases provided by three industrial and a university context. Results: The automatic detection yields an average precision of 59% at an average recall of 82% with high variation. The evaluation in practical environments indicates benefits such as an increase of the awareness of quality defects. Yet, some smells were not clearly distinguishable. Conclusion: Lightweight smell detection can uncover many practically relevant requirements defects in a reasonably precise way. Although some smells need to be defined more clearly, smell detection provides a helpful means to support quality assurance in Requirements Engineering, for instance, as a supplement to reviews.
Article
Reading times were collected for sentences in passages in order to examine how cognitive resources are distributed among different components of reading. Multiple regression analyses indicated that most of the reading time variance was predicted by macrostructure processing which integrates information from different sentences, as opposed to microstructure processing which includes the processing of words, syntax, and propositions. Experiment 1 revealed that slower readers require more time than faster readers to perform microstructure processing, but no differences were found for macrostructure components of reading. Experiment 2 revealed that variations in reading goals influence macrostructure processing but not microstructure processing. These findings suggest that functionally separate reading skills may be involved in microstructure versus macrostructure processing.
Article
Though negation is unique and central to human language, it has so far received little attention in cognitive neuroscience. The goal of the present study was to investigate the contrast between affirmative and negative sentences using fMRI, focusing on two central aspects, namely, (1) a semantic difference: affirmation is upward entailing, whereas negation is downward entailing; (2) a syntactic difference: negation involves more syntactic structure than affirmation. The behavioural data showed that negation significantly increased response times (but not the level of performance), even when negation was only in the preceding context to the response condition. The imaging results showed increased activation in the left premotor cortex from negation, compatible with rule-governed memory processing, and increased activation in the right supramarginal gyrus from affirmation, compatible with semantic processing. Finally, affirmation showed “default mode” activation in the cingulate cortex.
Article
Two experiments examining the influence of a story's structure on the comprehension of its sentences are presented. It was expected that sentences at high levels in a story would take longer to encode than those at low levels, either because cues to the sentences' roles exist within the story or because of differential difficulty of integrating the sentences into the prior context. Moreover, the greater density of new information early in stories might result in comprehension being affected by the serial position of a sentence within a story. The reading times for the individual sentences (or clauses) of stories were measured where a particular sentence appeared at one hierarchical (and/or serial) position in one story and at a different hierarchical (and/or serial) position in another story. In both experiments high-level sentences took longer to read than low-level ones and early-occurring sentences longer than late-occurring ones. Recall data supported the structural assignment of the critical sentences. These results were discussed both in terms of the initial hypotheses and in terms of W. Kintsch and T. A. van Dijk's (Psychological Review, 1978,85, 363–394) theory of text comprehension.
Article
The odds ratio (OR) is probably the most widely used index of effect size in epidemiological studies. The difficulty of interpreting the OR has troubled many clinical researchers and epidemiologists for a long time. We propose a new method for interpreting the size of the OR by relating it to differences in a normal standard deviate. Our calculations indicate that OR = 1.68, 3.47, and 6.71 are equivalent to Cohen's d = 0.2 (small), 0.5 (medium), and 0.8 (large), respectively, when disease rate is 1% in the nonexposed group; Cohen's d
Article
The items on the private check list are specific problems involving the correct use of the natural language in which the RS is written. It includes incorrect grammar, incorrect word placement, and all kinds of ambiguities. The lists of grammatical and word-placement problems are similar despite the difference in the natural languages involved. The syntactic problems are symptoms of ambiguities in meaning - a grammatical problem occurs when part of a sentence disagrees with another, and each choice in the disagreement corresponds to a different meaning. The use of plural to describe a property of elements of a set or of sets makes it difficult to determine whether the property is that of each element or of the whole set. A specification inspector can certainly search for plural constructions in a specification to examine each for its danger. Best of all is for a specification writer not to write plural statements when describing properties of each element of a set.
Article
r E. Lepore and B. Smith (eds), Handbook of Philosophy of Language, OUP. I Generality in Natural Language The first of our topics is the notion of quantifiers as expressions of generality. We have already observed that natural languages present us with a wide range of such expressions. We thus confront a number of questions, both foundational and descriptive: what are the semantics of expressions of generality, what sorts of basic semantic properties do they have, and what expressions of generality appear in natural language? One of the accomplishments of research over the last 25 years is to give interesting answers to these questions. Though many problems remain open, a great deal about the basic semantic properties of natural-language quantifiers is known. This is encapsulated in what is often known as generalized quantifier theory. This section will be devoted to the core of this theory. It should be noted at the outset that generalized quantifier theory is a large and well-dev