ArticlePDF Available

Entity-relationship and object-oriented data modeling—An experiment comparison of design quality

Authors:

Abstract and Figures

We compare EER and OO data models from the point of view of design quality. Quality is measured in terms of (a) correctness of the conceptual schemas being designed, (b) time to complete the design task, and (c) designers' preferences of the models. Result of an experimental comparison of the two models reveal that the EER model surpasses the OO model for designing unary and ternary relationships, it takes less time to design EER schemas, and the EER model is preferred by designers. We conclude that even if the objective is to implement an OO database schema, the recommended procedure is to: (1) create an EER conceptual scheme, (2) map it to an OO schema, and augment the target schema with behavioral constructs that are unique to the OO approach.
Content may be subject to copyright.
A preview of the PDF is not available
... 12,No.1, [ShoShi97] Shoval P. and Shiran S. (1997), "Entity-Relationship and Object-Oriented data modeling -an experimental comparison of design quality", Data & Knowledge Engineering, Vol. 21, pp. ...
... Batra HW DO. [7] were the first to refine the concept of correctness by measuring the correctness of various IDFHWV or structural elements of the model (entities, identifiers, descriptors, categories, and five different types of relationships: unary, binary one-to-many, binary many-to-many, ternary one-to-many-tomany, and ternary many-to-many-to-many). The same facet structure was used later in [12] and [45]. Kim & March [30] divided the analysis of model correctness into V\QWDFWLF and VHPDQWLF categories: Syntactic correctness refers to users' ability to understand and use the constructs of the modeling formalism, whereas semantic correctness is the extent to which the data model corresponds to the underlying semantics of the problem domain. ...
... The 8VHU DWWLWXGHV measured within this area of research are confidence [26], preference to use a certain model [43] [45], perceived usefulness of the modeling approach [30], and perceived ease-ofuse [7][24] [30]. ...
... Little is known about what effects the different decisions may have, with two notable observations: binaries vs. n-aries and plain relationship vs. also aggregation in the language. The latter was observed for UML vs. EER and ORM2 [11] and the former for UML [55]. The n-aries in UML class diagrams are hard to read due to the look-across notation [55] and it uses a different visual element than a binary association (diamond vs. line), and therefore used less frequently compared to EER and ORM2. ...
... The latter was observed for UML vs. EER and ORM2 [11] and the former for UML [55]. The n-aries in UML class diagrams are hard to read due to the look-across notation [55] and it uses a different visual element than a binary association (diamond vs. line), and therefore used less frequently compared to EER and ORM2. The interested reader is referred to [18] for a comprehensive explanations of the philosophical aspects. ...
Article
Full-text available
Multiple logic-based reconstructions of conceptual data modelling languages such as EER, UML Class Diagrams, and ORM exist. They mainly cover various fragments of the languages and none are formalised such that the logic applies simultaneously for all three modelling language families as unifying mechanism. This hampers interchangeability, interoperability, and tooling support. In addition, due to the lack of a systematic design process of the logic used for the formalisation, hidden choices permeate the formalisations that have rendered them incompatible. We aim to address these problems, first, by structuring the logic design process in a methodological way. We generalise and extend the DSL design process to apply to logic language design more generally and, in particular, by incorporating an ontological analysis of language features in the process. Second, we specify minimal logic profiles availing of this extended process, including the ontological commitments embedded in the languages, of evidence gathered of language feature usage, and of computational complexity insights from Description Logics (DL). The profiles characterise the essential logic structure needed to handle the semantics of conceptual models, therewith enabling the development of interoperability tools. There is no known DL language that matches exactly the features of thoseprofiles and the common core is small (in the tractable DL ALNI). Although hardly any inconsistencies can be derived with the profiles, it is promising for scalable runtime use of conceptual data models.
... Ignorance of limitations assures that no change towards proper correction is possible. Two common examples that have been investigated and shown to affect modelling ability and precision are the improved understanding by the modeller with look-here vs. look-across syntax notation for n-ary relationships [5] and the increased use and disambiguation of part-whole relations when it is a primitive in the language [6]. ...
... There is scant research on the effects of choosing one option over the other, with one notable observation in the related area of conceptual modelling: binaries vs. n-aries (Item 2c) and just plain relationship vs. also with aggregation (roughly parthood, Item 2a) do indeed make a difference at least for UML vs ER and ORM: UML class diagrams have significantly more aggregation associations than ER and ORM diagrams have, yet fewer n-aries [6]. The former is attributed to it being a separate element and the latter at least partially because there are obstructions to draw n-aries and to understand them due to its graphical notation [5], compared to ER and ORM that use the same notation for both binaries and n-aries and have no primitive for parthood. Parthood also features prominently in ontologies represented in the OBO format, where it was a primitive [22], whereas noting that absence of such a primitive in OWL might be an explanation for the well-known is-a/part-of confusion by novice ontologists. ...
Chapter
Full-text available
Multiple ontology languages have been developed over the years, which brings afore two key components: how to select the appropriate language for the task at hand and language design itself. This engineering step entails examining the ontological ‘commitments’ embedded into the language, which, in turn, demands for an insight into what the effects of philosophical viewpoints may be on the design of a representation language. But what are the sort of commitments one should be able to choose from that have an underlying philosophical point of view, and which philosophical stances have a knock-on effect on the specification or selection of an ontology language? In this paper, we provide a first step towards answering these questions. We identify and analyse ontological commitments embedded in logics, or that could be, and show that they have been taken in well-known ontology languages. This contributes to reflecting on the language as enabler or inhibitor to formally characterising an ontology or an ontological investigation, as well as the design of new ontology languages following the proposed design process.
... Each data type has a value, a unit, and an uncertainty, and belongs to an entity instance [37]. Different data models can be created along the DT life cycle and different models may share the same entities, which may raise the challenges of semantics, interoperability, scalability, quality, etc. Best practices to create high-fidelity data models, such as Entity Relationship Diagram (ERD) [41], Data Flow Diagram (DFD), Unified Modeling Language (UML) [8], ontology development, etc. To implement the conceptual, logical, and physical data models there is a need for data integration and interoperability, the subsequent sections discuss the DT data integration and interoperability. ...
Chapter
Digital Twin (DT) provides a digital representation of a real-world entity (process or product) that is continuously synchronized with a specified frequency. In this regard, DT utilizes a set of models that capture the various aspects of the real system to provide a deeper understanding and analysis of its real counterpart. The data within the DT holds paramount significance and serves as the foundation for model updating, refining, interoperability, validity, usability, etc. Accordingly, DT requires rigorous data management throughout its entire life cycle. This paper explores data knowledge areas related to DT (i.e., data governance, architecture, modeling, integration, interoperability, quality, uncertainty, visualization, and security) and also highlights their best practices, and proposes a Data Management Framework for Digital Twin (DMFDT) to facilitate a better understanding of the DT data related requirements and proven practices. Validation and application of the DMFDT is done through the high-level DT architecture and a case study of the proposed framework is also presented by a DT developed to study the mobility system at the University of Bordeaux in France.
... Finally, we scored each model. We awarded a "good" point for each construct that was presented appropriately, and a "bad" point for each inappropriate presentation of a construct (similar to Shoval & Shiran, 1997). The constructs we analyzed and scored are the constructs listed in Table 1 (i.e., actions, roles, start/end points, flows, and all types of nodes). ...
Article
Full-text available
Accurate process modeling is critical to the successful design of information systems. Therefore, learning to design correct, complete, and irredundant process models is an important part of training for systems analysts, yet it is very challenging, especially for novice analysts. To teach high-quality modeling skills, it is essential to identify the common difficulties encountered in designing process models. Motivated by this insight, we formulated two research objectives: (1) identify the errors made by novices during process modeling, and analyze and classify them in light of three quality criteria—completeness, irredundancy, and correctness; (2) identify the most common errors, particularly the most persistent ones, that is, those most resistant to training. To this end, we analyzed 525 models built by 181 students (two or three models per student) during an academic course. We classified the students’ modeling errors, based on the principles of the modeling language, and carried out a frequency analysis, wherein we counted the prevalence of each error type. Our analysis produced a four-layer hierarchical classification of errors with 52 elements, including 38 error categories, subcategories, and irreducible types. We also identified the most common and most persistent error categories, both of which pertained mainly to difficulties in abstracting from a given scenario. This hierarchical classification plays an important role in establishing ways to improve the quality of process models designed by systems analysts, especially novices. Moreover, identifying persistent errors and “cracking” them is an essential step in designing a learning methodology that will help novice analysts to recognize such errors and, indeed, avoid them in the first place.
... Of the 38 items, 8 presented significant differences favouring OPM and 2 favouring OMT. The authors refer their selection of the grading scheme to a previous work by Shoval and Shiran [40], who compare two data modelling techniques: extended entity relationship (EER) and object-oriented (OO). The grading scheme had nine items, finding significant differences for two items, favouring EER. ...
Article
Full-text available
Software-centric organisations design a loosely coupled organisation structure around strategic objectives, replicating this design to their business processes and information systems. Nowadays, dealing with business strategy in a model-driven development context is a challenge since key concepts such as the organisation’s structure and strategic ends and means have been mostly addressed at the enterprise architecture level for the strategic alignment of the whole organisation, and have not been included into MDD methods as a requirements source. To overcome this issue, researchers have designed the LiteStrat, a business strategy modelling method compliant with MDD for developing information systems. This article presents an empirical comparison of LiteStrat and with i*, one of the most used models for strategic alignment in an MDD context. The article contributes with a literature review on the experimental comparison of modelling languages, the design of a study for measuring and comparing the semantic quality of modelling languages, and empirical evidence of the LiteStrat and i* differences. The evaluation consists of a 2 × 2 factorial experiment recruiting 28 undergraduate subjects. Significant differences favouring LiteStrat were found for models’ accuracy and completeness, while no differences in modeller’s efficiency and satisfaction were detected. These results yield evidence of the suitability of LiteStrat for business strategy modelling in a model-driven context.
Chapter
The approach of diagramming or modelling conventions for each subject domain that we saw for biological models doesn’t scale to all subject domains. Yet, other disciplines use or want to use models too. To overcome this challenge, we take a turn into computing and application development, since clearly what they do is being used across very many subject domains and it works somehow. The solution advanced here is conceptual data modelling languages for database and application design. This chapter will first describe one such modelling strategy, being Entity-Relationship diagrams with its extension and two similar modelling languages that capture the declarative ‘what’ of a domain. This is richly illustrated with examples on topics such as books, management, cars, and molecules. Computing has tried and tested procedures for developing such models, including the Conceptual Schema Design Procedure, which will be described afterwards. The main procedures are illustrated with conceptual model development for a prospective scientific database about data for our dancing lyrebirds. While solving limitations of biology diagrams, a changing Information Technology landscape may be demanding more than traditional conceptual data models currently deliver, and so also here, we close with a brief section on limitations.
Chapter
Software systems execute tasks that depend on different types of resources. The variability of resources hinders the ability of software systems to execute important tasks. For example, in automated warehouses, malfunctioning robots could delay product deliveries and cause financial losses due to customer dissatisfaction. Resource-driven adaptation addresses the negative implications of resource variability. Hence, this paper presents a task modelling notation called SERIES, which is used for representing task models that support resource-driven adaptation in software systems. SERIES is complemented by a tool that enables software practitioners to create and modify task models. SERIES was evaluated through a study with software practitioners. The participants of this study were asked to explain and create task models and then provide their feedback on the usability of SERIES and the clarity of its semantic constructs. The results showed a very good user performance in explaining and creating task models using SERIES. These results were reflected in the feedback of the participants and the activities that they performed using SERIES.KeywordsTask modelling notationResource-driven adaptation
Chapter
The emergence of XML as the de facto for data exchange in the World Wide Web and the increase popularity of XML in the business application have urge momentum research on way to generate a well-formed XML document to store and maintain it in the databases. Thus, a good schema such as XML schema is undeniable needed in order to define the syntax and structure of the XML instance to ensure data integrity. Nevertheless, schemas serve as logical model rather than conceptual model where semantics of the underlying document are hardly expressed. As such, in this paper, the authors proposed X-CM, a new conceptual modeling for XML as the mechanism to model components of XML conceptually and to express the underlying semantic explicitly. First, the authors reviewed the semantics and structure of conceptual modeling of existing approaches. Then, the authors proposed their X-CM modeling construct and implemented X-CM in a university based scenario. Lastly, the authors summarized the evaluation result and comments provided by the XML database experts and evaluators based on the authors proposed model.
Article
The paper describes our current research activities and results related to developing knowledge‐based systems to support the creation of entity‐relationship (ER) models. The authors based obtaining an ER model in textual form on translation from one language into another, that is, from an English controlled natural language into the formalized language of an ER data model. Our translation method consisted of creating translation rules of sentential form parts into ER model constructs based on the textual and character patterns detected in the business descriptions. To enable the computer analyses necessary for creating translation mechanisms, we created a linguistic corpus that contains lists of the business descriptions and the texts of other business materials. From the corpus, we then created a specific dictionary and linguistic rules to automate the business descriptions' translation into the ER data model language. Before that, however, the corpus was enriched by adding annotations to the words related to ER data model constructs. In this paper, we also present the main issues uncovered during the translation process and offer a possible solution with utility evaluation: applying information‐extraction performance measures to a set of sentences from the corpus.
Article
Full-text available
A database conceptual schema serves as a communication medium between professional analysts/designers and users who wish to comprehend and validate the conceptual schema. The conceptual schema is usually presented in a diagrammatic form that follows a specific semantic data model. The extended entity-relationship (EER) model is one of the most commonly used models, but being “threatened” by the objectoriented (OO) approach, which penetrates into the areas of systems analysis and design, as well as data modeling. The issue of which of the two data models is better for modeling reality and is easier to comprehend is still an open question. Our response to this question was to conduct a controlled experiment in which two groups of users, each trained to use one of the models, were tested for the comprehension of equivalent schema diagrams. Comprehension was measured by the number of correct answers to questions that addressed different constructs of the models. The results of the experiment reveal that there is no significant difference in comprehension of facts dealing with attributes of entities or objects, binary-relationships and two relationships, but those dealing with ternary-relationships are significantly easier to comprehend with the EER model. Comprehension of other, unclassified facts, however, is easier with the OO model. We propose a special symbol for objects representing ternary and higher order relationships in order to overcome the weakness of OO diagrams.
Article
Full-text available
The purpose of this article is to report the design and results of a study that was conducted to test if the use of a semantic model instead of a relational model results in superior end-user performance. The semantic model used was the extended entity-relationship model. A pilot study was conducted in November 1987. The main purpose of this study was to identify any procedural problems; the data collected was not used for analysis. The data for the many study was collected during February 1988 and September 1988. Prior research is reviewed, and the methodology of the present study is described. Seven hypotheses addressed by the study are presented. Five hypotheses were relationship-based, and one each pertained to identifiers and perceived ease of use. The results are presented and interpreted. The EER model was found to lead to better performance, but was not perceived as significantly easier to use than the relational model.
Article
While implementation/logical data models have been extensively studied and reported on, there is relatively less attention on the conceptual data models, especially from an end-user empirical perspective. Conceptual models are more suited for end-users due to the richness in semantic expressiveness and user-oriented features, but usually are not directly implemented. In this article, we examine three conceptual models: data structure diagram, entity-relationship model, and object-oriented model from the viewpoint of endusers. Results of two empirical studies, one experimental and one survey, are described. A comparative examination of the three data models on comprehension, efficiency, productivity, and a whole host of other characteristics has been made. The general evidence from the experimental study is that the user performance is much superior in terms of comprehension, efficiency, and productivity using the object-oriented model than the data structure diagram or the entity-relationship model. The second study suggests that this clear user preference for the OOM model diminishes with increased computer and database experience. Given the explosive growth in recent years of end-user computing and their use of databases, the findings of this study should be of great concern for users as well as information systems specialists.
Article
This research compares success in developing conceptual data models for the extended entity relationship (EER) model and Kroenke’s object oriented model. A laboratory study was used to evaluate model correctness for 38 subjects divided into two equally sized groups where each group was trained in one of the modeling methods. Modeling correctness is measured in terms of eight different facets of a conceptual data model: (1) entities/objects, (2) attribute/property identifiers, (3) categories, (4) unary one-one relationships, (5) binary one-many relationships, (6) binary many-many relationships, (7) ternary one-many-many relationships, and (8) ternary many-many-many relationships. The EER model provided significantly improved performance for the attribute/ property identifier, unary one-one relationship, and binary many-many relationship facets. Purchase this article to continue reading all 10 pages >
Chapter
The Entity-Relationship model (ER model) views data in the form of entities and relationships. The Object-Oriented model (OO model) views data as classes, types and their subtypes. A mapping procedure that considers various features of the ER model and transforms the ER schema and its associated relational schema into an OO schema is proposed. The mapping rules are illustrated with appropriate examples. A procedure for mapping constraints on the ER database into the OO schema is then discussed. Finally, OO representations are developed for ER schema operations.
Article
Our objective in this paper is to provide a thorough understanding of the usability of data management environments with an end to conducting research in this area. We do this by synthesizing the existing literature that pertains to (i) data modelling as a representation medium and (ii) query interface evaluation in the context of data management. We were motivated by several trends that are prevalent in the current computing context. First, while there seems to be a proliferation of new modelling ideas that have been proposed in the literature, commensurate experimental evaluation of these ideas is lacking. Second, there appears to exist a significant user population that is quite adept at working in certain computing environments (e.g. spreadsheets) with a limited amount of computing skills. Finally, the choices in terms of technological platforms that are now available to implement new software designs allow us to deal with the implementation issue more effectively. The outcomes of this paper include a delineation of what constitutes an appropriate conceptualization of this area and a specification of research issues that tend to dominate the design of a research agenda.
Article
We describe an algorithmic method for transforming a binary-relationship (BR) conceptual schema to an object-oriented (OO) database schema. The BR schema is a semantically rich diagram that represents the reality being modeled in terms of objects, relationships and constraints. It is easy to understand and serves as a communication tool between users and designers. Therefore it can be created in the early stages of system development, and later on be transformed into a specific OO database schema. The transformation method employs a multi-stage algorithm, which first identifies the essential objects in the BR schema, together with their relationships and constraints. These are then mapped to object classes, attributes, and constraints, maintaining the semantics and all types of constraints present in the conceptual schema.