Book

Building the Data Warehouse

Authors:
... A data warehouse is a subject-oriented, integrated, timevariant, non-volatile collection for corporate decision-making [4,5]. Subject-oriented indicates discussion about some subject chain like procurement, production, and sales of an enterprise's product. ...
... To categories the three different kinds of query on the warehouse-like warehouse query, Cube Schema Query, and cube data Query is related. However, Inmon criticized the significant disadvantages of the virtual approach of the data warehouse [5]. OLAP (Online Analytical Process) is a set of measures used to analyze the data of Data Warehouse. ...
... Data warehousing (DW) is a repository that collects data from multiple heterogeneous data sources into a single multi-dimensional source for analysis purposes [57]. The main objective of the DW is to improve decision making by providing greater insights into the organization's performance. ...
... Proposed DW-based recommender architecture (inspired by the typical three-tier data warehouse architecture proposed in[57]). ...
Article
Full-text available
Nowadays, manufacturers are shifting from a traditional product-centric business paradigm to a service-centric one by offering products that are accompanied by services, which is known as Product-Service Systems (PSSs). PSS customization entails configuring products with varying degrees of differentiation to meet the needs of various customers. This is combined with service customization, in which configured products are expanded by customers to include smart IoT devices (e.g., sensors) to improve product usage and facilitate the transition to smart connected products. The concept of PSS customization is gaining significant interest; however, there are still numerous challenges that must be addressed when designing and offering customized PSSs, such as choosing the optimum types of sensors to install on products and their adequate locations during the service customization process. In this paper, we propose a data warehouse-based recommender system that collects and analyzes large volumes of product usage data from similar products to the product that the customer needs to customize by adding IoT smart devices. The analysis of these data helps in identifying the most critical parts with the highest number of incidents and the causes of those incidents. As a result, sensor types are determined and recommended to the customer based on the causes of these incidents. The utility and applicability of the proposed RS have been demonstrated through its application in a case study that considers the rotary spindle units of a CNC milling machine.
... In particular for academia, data lakes come with the promise to provide solutions for several data management challenges at once. Similar to Data Warehouses (Devlin and Murphy, 1988;Inmon, 2005), data lakes aim at integrating heterogeneous data from different sources into a single, homogeneous data management system. This allows data holders to overcome the limits of disparate and isolated data silos and enforce uniform data governance. ...
Article
Full-text available
Data lakes are a fundamental building block for many industrial data analysis solutions and becoming increasingly popular in research. Often associated with big data use cases, data lakes are, for example, used as central data management systems of research institutions or as the core entity of machine learning pipelines. The basic underlying idea of retaining data in its native format within a data lake facilitates a large range of use cases and improves data reusability, especially when compared to the schema-on-write approach applied in data warehouses, where data is transformed prior to the actual storage to fit a predefined schema. Storing such massive amounts of raw data, however, has its very own challenges, spanning from the general data modeling, and indexing for concise querying to the integration of suitable and scalable compute capabilities. In this contribution, influential papers of the last decade have been selected to provide a comprehensive overview of developments and obtained results. The papers are analyzed with regard to the applicability of their input to data lakes that serve as central data management systems of research institutions. To achieve this, contributions to data lake architectures, metadata models, data provenance, workflow support, and FAIR principles are investigated. Last, but not least, these capabilities are mapped onto the requirements of two common research personae to identify open challenges. With that, potential research topics are determined, which have to be tackled toward the applicability of data lakes as central building blocks for research data management.
... al. 2010]. O DW é um repositório de dados orientado por assunto, integrado, não volátil e variável em relação ao tempo [Inmon 2002;Turban et. al. 2009;Kimball e Ross 2002;Ferreira et. ...
Chapter
Full-text available
Dentre as tecnologias de comunicação sem fio, a Comunicação por Campo de Proximidade, do inglês, Near Field Communication (NFC) tem recentemente atraído o interesse tanto da comunidade científica quanto da indústria, além de possuir grande aceitação por parte dos usuários. Tais fatores tornam o NFC uma tecnologia de comunicação promissora. Este capítulo tem por objetivo apresentar um estudo teórico sobre tal tecnologia, suas aplicações, além de vulnerabilidades, ataques e as contramedidas que visam garantir a segurança da informação trocada ou armazenada em dispositivos habilitados para operar com NFC. Além do estudo teórico, demonstra-se, através de estudo de caso prático na área de educação, como explorar vulnerabilidade na comunicação entre dispositivos e a contramedida existente. Com isso, espera-se aumentar o conhecimento sobre a tecnologia contribuindo para o seu uso …
... DW was presented by Inmon (2005) as a collection of data that supports decision-making processes. It is subject-oriented, integrated and consistent and shows its evolution over time and it is not volatile DW isdefined by Golfarelli & Rizzi (2010) as a collection of methods, techniques, and tools used to support knowledge workers-senior managers, directors, managers, and analysts-to conduct data analyses that help with performing decision-making processes and improving information resources. ...
Article
Full-text available
Effect of Human Resource Management Practices on Achieving Strategic Objectives of Jordan Electric Power Company.
... Ellos coleccionan, transforman, estructuran y almacenan datos organizacionales en un repositorio que posteriormente es utilizado para realizar tareas de análisis [7]. Algunos autores referentes, tales como Inmon, R.Kimball y M.Ross [8], [9] expresan que un DW (o almacén de datos en español), no es una mera copia de los datos transaccionales en una plataforma diferente; sino que tiene necesidades, objetivos, clientes y ritmos profundamente diferentes a los sistemas operacionales. Otra característica distintiva es que ellos preservan el contexto histórico de los datos para poder analizar el rendimiento de una organización a lo largo del tiempo. ...
Conference Paper
Full-text available
RESUMEN Las organizaciones de hoy enfrentan desafíos cada vez más complejos en términos de gestión y resolución de problemas para alcanzar sus objetivos y metas; están obligados a contar con información y conocimiento que sustenten la toma de decisiones, más allá de la mera intuición. Es justamente la Inteligencia de Negocios (IN), el área de conocimiento que comprende la colección de metodologías, procesos, arquitecturas y tecnologías que hacen posible generar esas necesarias y valiosas evidencias. El elemento esencial es el dato; es necesario entonces aclarar los términos dato, información y conocimiento. Dato es una representación simbólica, no tiene contenido semántico; mientras que la información refiere a un conjunto de datos procesados que tiene un significado; a su vez, la información sintetizada, analizada e interpretada da origen al conocimiento. Estos dos últimos constituyen las piezas claves para una acertada toma de decisiones. Esta temática originalmente fue aplicada en las grandes empresas. Sin embargo, hoy en día tanto el estado como las micro y pequeñas empresas también tienen la necesidad y la posibilidad de incorporarlas a sus contextos.
... Structured data analytics entails the analysis of large quantities of structured data gathered by business or scientific applications. As mentioned earlier, structured data sources are managed by RDBMSes and data warehouses are often used to integrate all the data sources (Inmon 2005). In terms of taxonomy, data analytics can be achieved according to (1) descriptive, (2) predictive, and (3) prescriptive analytics. ...
... A Data Warehousing may be defined as a data-driven decision support system that supports the decision-making process in a strategic sense and, in addition, operational decision-making [14]. The early implementations of decision support systems based on data extracted from operational systems dates back to late 80s, and in the 1991 Bill Inmon, generally considered the father of data warehousing, publishes the book Building the Data Warehouse [8], in which DWH systems have been described in terms of architecture and data modelling. Since the early 90s, big Companies are using DWH systems to support their strategic decision-making processes. ...
Chapter
The development of the digital economy is one of the priority areas for most countries, so the requirements for ICT specialists are changing and increasing, as a consequence of these transformation. In this perspective, the strengthening, measuring and assessment of digital competences are becoming crucial to ensure quality and security of the digital products and services implemented. This paper proposes MMACK, a new Meta-model for assessing ICT competencies and knowledge. This work also explores the feasibility of the implementation of such a model using Data Warehousing software.
... Departments organize and oversee their own databases (administrative, academic, marketing, financial, etc.) which control a large amount of common data [1]. The term Data warehouse (DW) can be described as timevariant, subject-oriented, non-volatile and integrated data that can be used in supporting strategic decision making [2,3]. DW accommodates a huge set of permanent historical data that are useful in administrative decisionmaking when it comes to data access and retrieval for time analysis, decision making, and knowledge discovery [4]. ...
... Aus Abbildung 10 ebenfalls nicht ersichtlich ist, welche Anforderungen an Daten gestellt werden (müssen), um ein wie in der Definition des Datenmanagements postuliertes optimales Nutzungspotential auszuschöpfen. Nach Zerlauth und de Haas sollen Daten, die dem Datenmanagement zugrunde liegen, Eigenschaften wie Belegbarkeit, Vollständigkeit, Richtigkeit, Zeitgerechtigkeit, Klarheit, Verfügbarkeit und Nachvollziehbarkeit (Zerlauth & de Haas, 1995) (Inmon, 1996). Das Data Warehouse ist also eine themenorientierte, integrierte, nicht-flüchtige und zeitbezogene Datenbank zur Unterstützung von Managemententscheidungen. ...
Thesis
Full-text available
Daten wurden in den letzten Jahren immer mehr zu "Rohstoffen". Unternehmen sind sich dessen bewusst, nutzen Daten aber bislang fast ausschließlich dazu, Prozesse, Produkte, Marketing und Controlling effektiver zu gestalten. Zukünftig werden Daten, die im Unternehmen vorhanden sind, oder Daten die Unter-nehmen mittel bis langfristig erheben, in weitaus größerem Maße als bislang kosten-effizient erfasst, gespeichert und analysiert werden können. Damit stellen Daten eine weitere, bis jetzt nicht bewertete Einflussgröße in der Bewertung eines Unternehmens dar. Diese Arbeit setzt sich mit folgenden Fragestellungen auseinander: Was wird unter dem Begriff Daten verstanden? Inwieweit ist eine Abgrenzung zu Unternehmensdaten möglich? Können diese Unternehmensdaten als immaterielles Gut betrachtet werden? Anhand welcher Methoden und Vorgehensweisen ist eine Bewertung von Unternehmensdaten umsetzbar? Dazu wird der Datenbegriff zunächst definitorisch untersucht und anschließend speziell auf Unternehmensdaten fokussiert. Auf dieser Basis erfolgt eine Festlegung von bewertungsrelevanten Kriterien, den sogenannten Daten-Dimensionen. Im zweiten Abschnitt der Arbeit wird die Einordnung von Daten als immaterielles Gut auf der Grundlage gesetzlicher Bestimmungen untersucht und die hierfür infrage kommen-den Methoden skizziert. Im Anschluss daran erfolgt die Synthese aus Daten, Methoden und Bewertungsmöglichkeiten, die in einer index-basierten Bewertung – dem Thofla-Index – kumuliert. Im Ergebnis wird deutlich, dass (Unternehmens-) Daten als immaterielles Gut einge-ordnet werden können. Eine ganzheitliche Bewertung von Unternehmensdaten, ent-lang der Daten-Dimensionen (Herkunft, Verarbeitung/Aufbewahrung, Verwendungszweck, Daten-Qualität, Daten-Quantität, Relevanz, Kompetenz, Infrastruktur, Rechtssicherheit) ist allerdings mit denen bislang, für immaterielle Güter, vorhandenen Methoden und Ansätzen wenn überhaupt, dann nur eingeschränkt möglich. Die Index-basierte Thofla-Methode erfasst alle Daten-Dimensionen. Als auf einen Index-basierte Methode erlaubt die Thofla-Methode vor allem im zeitlichen Verlauf Rückschlüsse, zum Beispiel für Steuerungszwecke und das Benchmarking mit anderen Unternehmen. Eine konkrete, monetäre Bewertung der Unternehmensdaten ist allein mit dieser Methode nicht zu gewährleisten. Hierfür bedarf es die Einbeziehung weiterer Methoden, sowie eine eingehende (Daten-)Analyse des gesamten Unternehmens entsprechend der in dieser Arbeit vorgestellten Vorgehensweise. Erst im Anschluss daran kann eine Bewertung der Unternehmensdaten erfolgen.
... Traditionally, a DSS incorporates all data relevant to the management of an organization into a specific repository used for analytical purposes named data warehouse. As defined in [2], a data warehouse is a ''subject-oriented, integrated, time-variant and non-volatile collection of data in support of The associate editor coordinating the review of this manuscript and approving it for publication was Genoveffa Tortora . management's decision-making process and business intelligence''. ...
Article
Full-text available
Nowadays, the data used for decision-making come from a wide variety of sources which are difficult to manage using relational databases. To address this problem, many researchers have turned to Not only SQL (NoSQL) databases to provide scalability and flexibility for On-Line Analytical Processing (OLAP) systems. In this paper, we propose a set of formal rules to convert a multidimensional data model into a graph data model (MDM2G). These rules allow conventional star and snowflake schemas to fit into NoSQL graph databases. We apply the proposed rules to implement star-like and snowflake-like graph data warehouses. We compare their performances to similar relational ones focusing on the data model, dimensionality, and size. The experimental results show large differences between relational and graph implementations of a data warehouse. A relational implementation performs better for queries on a couple of tables, but conversely, a graph implementation is better when queries involve many tables. Surprisingly the performances of a star-like and snowflake-like graph data warehouses are very close. Hence a snowflake schema could be used in order to easily consider new sub-dimensions in a graph data warehouse.
... Par ailleurs, les magasins de données sont des étoiles regroupées par fonctions et qu'un entrepôt de données est l'ensemble de tous ces magasins de données. Il existe trois méthodes permettant de les regrouper afin de mettre en oeuvre un EDD [23]. ...
Thesis
Full-text available
Urban issues are on top of development questions in Cameroon. Urban population grows rapidly without sufficient socials infrastructures and services. This urban trend has many origins. The principles are the destitution of population, the bad governance. As direct consequences, we have the lack of basis social structures, the degradation of natural environment, proliferation of spontaneous neighbourhood, the lack of urban data, complete and reliable. The urban actors, especially managers need, to smoothly handle towns’ problems, reliable items of information. Those pieces of information should be easily processed. Advent of Information and Communication Technologies (ICT) allowed the development of various systems. Geographical Information Systems (GIS) and Data Warehouses are respectively technologies of representation and managing of mass information. Putting together, they become a powerful tool of urban data management and decision-making support. The goal in this work was to use the strength arising from combination of these two technologies. Therefore, after studying GIS, Data Warehouses and their coupling ; analysing the urban sector in Cameroon, we proposed a multidimensional model to manage data on constructions and on networks. We also proposed geodecisional and GIS applications which should allow the implementation of the model. As far as these work are concerned, we used free software. Keywords : urban question, urban data, GIS, Data Warehouse, geodecision, GIS free software.
... We cover Gartner's Logical Data Warehouse and Data Fabric concepts [3,4] as well as Dehghani's Data Mesh proposal [5]. Moving forward, we want to cover further architecture paradigms including classic Data Warehouse (both Kimball and Inmon style) [6,7], Lindstedt's Data Vault [8]), Data Lake as well as Lambda and Kappa architectures [9,10] and extend the model to clearly show the dependencies and shared elements as already indicated in figure 2 in the next section. This should ultimately lead to a pattern system similar to the GoF's software design patterns [11]. ...
... We have chosen the architecture proposed by Inmon, in which the data warehouse is a "subject-oriented, nonvolatile, integrated, time-variant collection of data in support of management's decisions." [17]. Following this architecture, the data warehouse acts as a unified source where all data is integrated. ...
Article
Full-text available
This paper describes the development and implementation of an anesthesia data warehouse in the Lille University Hospital. We share the lessons learned from a ten-year project and provide guidance for the implementation of such a project. Our clinical data warehouse is mainly fed with data collected by the anesthesia information management system and hospital discharge reports. The data warehouse stores historical and accurate data with an accuracy level of the day for administrative data, and of the second for monitoring data. Datamarts complete the architecture and provide secondary computed data and indicators, in order to execute queries faster and easily. Between 2010 and 2021, 636 784 anesthesia records were integrated for 353 152 patients. We reported the main concerns and barriers during the development of this project and we provided 8 tips to handle them. We have implemented our data warehouse into the OMOP common data model as a complementary downstream data model. The next step of the project will be to disseminate the use of the OMOP data model for anesthesia and critical care, and drive the trend towards federated learning to enhance collaborations and multicenter studies.
... These business intelligence (BI) decision systems appeared with the introduction of the data warehouse (DW) by Bill Inmon in 1991. According to [1], "A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision-making process". A DW is a system used for integrating, storing, and processing data from often heterogeneous data sources, in order to provide decision-makers with a multi-dimensional view. ...
Article
Full-text available
The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such as unified modeling language (UML), ontology, model-driven architecture (MDA), model-driven development (MDD), and graphical flow, which includes business process model notation (BPMN), colored Petri nets (CPN), Yet Another Workflow Language (YAWL), CommonCube, entity modeling diagram (EMD), and so on. With the emergence of Big Data, despite the multitude of relevant approaches proposed for modeling the ETL process in classical environments, part of the community has been motivated to provide new data warehousing methods that support Big Data specifications. In this paper, we present a summary of relevant works related to the modeling of data warehousing approaches, from classical ETL processes to ELT design approaches. A systematic literature review is conducted and a detailed set of comparison criteria are defined in order to allow the reader to better understand the evolution of these processes. Our study paints a complete picture of ETL modeling approaches, from their advent to the era of Big Data, while comparing their main characteristics. This study allows for the identification of the main challenges and issues related to the design of Big Data warehousing systems, mainly involving the lack of a generic design model for data collection, storage, processing, querying, and analysis.
... Two researchers from IBM (Devlim& Murphy, 1988), coined the term information warehouse and researchers in due course in various IT companies began experimenting with data warehouses. However, Bill Inmon in his book, Building the Data Warehouse, (Inmon, 1992;Inmon 2001), made data warehouses practical. Ralph Kimball, who has a PhD from Stanford, also published in 1996 (Kimball, 1996), The Data Warehouse Toolkit, which further sought to formalize data warehousing using a lot of practical examples. ...
Article
Full-text available
This study concentrated on designing, developing, and implementing a decision support system (DSS) to track the maintenance schedules of an organization such as a university's fleet of vehicles. The schedule takes into account the mobile and immobile vehicles at any point in time, the vehicles due for scheduled maintenance, vehicles due for insurance renewal etc.The database design was done using entity relationship modeling (ERM) technique and subsequently developed a data warehouse (DWH) that contained records of the vehicles such as insurance type, servicing type, make and model etc. The DWH was implemented using MariaDB running through PhpMyAdmin as the administrative GUI interface. The developed DWH was next deployed as a Web application using the Yii PHP component framework just to give the developed software a global reach.With that done, taking decisions as to which vehicle is mobile and which is immobile, which requires insurance renewal, which requires servicing etc. at any point in time, is properly tracked and regulated.
... According to (Inmon 1996), requirements are to be understood by users that analyse the querying results after the decisional system's population. Authors of (Golfarelli and Rizzi 2009) cited three success prerequisites for the data-driven approach: ...
Thesis
Data warehouses (DWs) are widely known for their powerful analysis capabilities that serve either for historic data investigation or for predictions of potentially continuous phenomena. However, they are still in most cases limitedly used except by enterprises or governments while, with the huge amounts of data produced and collected by the Web2.0 technologies, many other unusual users might benefit from analysing their data if DWs are properly dedicated to their specific needs. They might be association adherents, online community members, observatory volunteers, etc. Unlike in classical contexts, requirements engineering RE with volunteers lacks group cohesion and straightforward strategic objectives. This is hence because they come with different backgrounds and they do not have an acknowledged representative leadership, which would very likely lead to multiple contradictory interpretations of the data and consequently of conflictual requirements. When stakeholders have divergent goals, it becomes problematic to maintain an agreement between them, especially when it comes to eliciting DW requirements whose future use is meant to serve as larger interested public as it possibly could. In this work, we propose a new generic and participative DW design methodology that relies on a Group Decision Support System (GDSS) to support the collaboration of the engaged volunteers. We suggest in this methodology two RE scenarios, (i) using GDSS for a collaborative elicitation when groups of users with common objectives are identifiable or (ii) with pivot tables and rapid prototyping formalisms when only individual volunteers are participating. Then, we reduce the number of the resulting models by fusing them based on their multidimensional (MD) similarities. The fused models require a further refinement that focuses on solving the remaining subject matter inconsistencies that are due to either erroneous definitions of unspecialized volunteers or to conceptually admissible, but irrelevant to the application domain, newly generated elements after the fusion. This is handled by the “collaborative resolution of requirement conflicts” step that we defined two methods for its execution. The first is a simplified collaborative method that we evaluate in which each model’s MD elements against a reduced number of criteria that apply for each component’stype using an existing GDSS that allows the collaborative process execution. The second is a profile-aware method that we suggest for which a more detailed set of evaluation criteria and adaptability of the collaborative process to allow its use by both crowdsourcing and enterprise DW design projects. As GDSS are designed to support a group engaged in a collective decision process, which is the main tool that we rely on which in two stages of our methodology i.e. RE and collaborative refinement of the fused models, we also propose a new GDSS that we adopted in its architecture the concept of Thinklets i.e. a well-known design pattern for collaborative processes. In addition to the group activities reproducibility that offers the concept of Thinklets, we have as well implemented a recommender system prototype that is mainly based on a hierarchical division of decision categories and an automatization of certain assistive functionalities to allow a guided and appropriate use of the system devoted to the facilitator. This has been done after a set of experiments conducted with real volunteer users engaged in solving risk management and uncertainty group problems. The new GDSS that we suggest introduces a customized implementation of certain Thinklets in order to improve their suitability to our methodology as well as for novice and inexperienced users from a more general perspective. In addition to that, we propose a new Thinklet, namely CollaborativeDW, that allows a fluid configuration and dynamic execution of our second refinement method i.e. the profile-aware approach, and that we have tested with real users.
... These business intelligence (BI) decision systems appeared with the introduction of the data warehouse (DW) by Bill Inmon in 1991. According to [1], "A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision-making process". A DW is a system used for integrating, storing, and processing data from often heterogeneous data sources, in order to provide decision-makers with a multi-dimensional view. ...
Article
Full-text available
The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such as unified modeling language (UML), ontology, model-driven architecture (MDA), model-driven development (MDD), and graphical flow, which includes business process model notation (BPMN), colored Petri nets (CPN), Yet Another Workflow Language (YAWL), CommonCube, entity modeling diagram (EMD), and so on. With the emergence of Big Data, despite the multitude of relevant approaches proposed for modeling the ETL process in classical environments, part of the community has been motivated to provide new data warehousing methods that support Big Data specifications. In this paper, we present a summary of relevant works related to the modeling of data warehousing approaches, from classical ETL processes to ELT design approaches. A systematic literature review is conducted and a detailed set of comparison criteria are defined in order to allow the reader to better understand the evolution of these processes. Our study paints a complete picture of ETL modeling approaches, from their advent to the era of Big Data, while comparing their main characteristics. This study allows for the identification of the main challenges and issues related to the design of Big Data warehousing systems, mainly involving the lack of a generic design model for data collection, storage, processing, querying, and analysis
Article
Full-text available
Education for responsible consumption has become a challenge for the school systems of contemporary societies. Its development at the Primary Education level requires teachers to have personal and professional competences that will allow them to promote training in order to make consumers responsible and conscious of the socio-cultural, economic and health impacts that are linked to the purchase, use and disposition of goods and services. The aim of this work is to propose a model for teacher training related to responsible consumption and contextualized in the Ecuadorian Primary Education. This model was developed considering theoretical and practical elements that guide the development of education for consumption in the Ecuadorian Primary General Education system. The proposal is based on a set of international and national guidelines, contents and strategies aimed at education for consumption, as well as processes that guarantee motivation, participation, accompaniment and socialization of educators throughout the course of action. Finally, it is expected that the development of personal and educational competences will allow teachers to conduct integrative educational projects on education for responsible consumption.
Chapter
Increasingly, scientist have begun to collect biological data in different information systems and database systems that are accessible via the internet, which offer a wide range of molecular and medical information. Regarding the human genome data, one important application of information systems is the reconstruction of molecular knowledge for life science data. In this review paper, we will discuss major problems in database integration and present an overview of important information systems. Furthermore, we will discuss the information reconstruction and visualization process based on that integrated life science data. These database integration tools will allow the prediction for instance of protein–protein networks and complex metabolic networks.
Article
The article provides an overview of research and development of databases since their appearance in the 60s of the last century to the present time. The following stages are distinguished: the emergence formation and rapid development, the era of relational databases, extended relational databases, post-relational databases and big data. At the stage of formation, the systems IDS, IMS, Total and Adabas are described. At the stage of rapid development, issues of ANSI/X3/SPARC database architecture, CODASYL proposals, concepts and languages of conceptual modeling are highlighted. At the stage of the era of relational databases, the results of E. Codd’s scientific activities, the theory of dependencies and normal forms, query languages, experimental research and development, optimization and standardization, and transaction management are revealed. The extended relational databases phase is devoted to describing temporal, spatial, deductive, active, object, distributed and statistical databases, array databases, and database machines and data warehouses. At the next stage, the problems of post-relational databases are disclosed, namely, NOSQL-, NewSQL- and ontological databases. The sixth stage is devoted to the disclosure of the causes of occurrence, characteristic properties, classification, principles of work, methods and technologies of big data. Finally, the last section provides a brief overview of database research and development in the Soviet Union.
Chapter
Data Spaces form a network for sovereign data sharing. In this chapter, we explore the implications that the IDS reference architecture will have on typical scenarios of federated data integration and question answering processes. After a classification of data integration scenarios and their special requirements, we first present a workflow-based solution for integrated data materialization that has been used in several IDS use cases. We then discuss some limitations of such approaches and propose an additional approach based on logic formalisms and machine learning methods that promise to reduce data traffic, security, and privacy risks while helping users to select more meaningful data sources.
Article
One of the important aspects in management and acceleration of processes, operations in databases and data warehouses is ETL processes, the process of extracting, transforming and loading data. These processes without optimizing, a realization data warehouse project is costly, complex, and time-consuming. This paper provides an overview and research of methods for optimizing the performance of ETL processes; that the most important indicator of ETL system's operation is the time and speed of data processing is shown. The issues of the generalized structure of ETL process flows are considered, the architecture of ETL process optimization is proposed, and the main methods of parallel data processing in ETL systems are presented, those methods can improve its performance. The most relevant today of the problem is performance of ETL processes for data warehouses is considered in detail.
Article
Full-text available
This paper adopts the concepts of observatory and competitive intelligence (CI) to model a system that will generate better insights for decision-makers in the solid waste industry. The first part of this work is to design and develop a data warehouse (DWH) of solid waste statistics using data assembled from disparate sources. Our methodology of design is the entity relationship diagram (ERD) and our implementation tool is MySQL running on phpMyAdmin. The second part of our work will be to turn our developed DWH into a Web application using the Yii PHP component framework. Our findings indicated that the application of both concepts of observatory and CI lead to better insights for decision-makers and hence better organizational performance.
Conference Paper
Very recently, ESEAP mutual authentication protocol was designed to avoid the drawbacks of Wang et al. protocol and highlights that the protocol is protecting all kind of security threats using informal analysis. This work investigates the ESEAP protocol in security point of view and notices that the scheme is not fully protected against stolen verifier attack and does not provide user anonymity. Furthermore, the same protocol has user identity issues, i.e., the server cannot figure out the user identity during the authentication phase. Later we discuss the inconsistencies in the security analysis of ESEAP presented by RESEAP.
Chapter
Requirement’s data models’ quality is one of the main data models that influence quality of information stored in data warehouse to take major decisions in organization. Thus, it becomes significant for an organization to maintain and assure the information quality of data warehouse. Very few proposals were seen in the literature for assuring quality of requirements data model of DW. However, no theoretical validation of DW requirements metrics using Zuse’s framework was seen in the literature. Hence, in this paper, theoretical validation of requirements traceability metrics (based on agent goal decision information model) is done by applying Zuse’s framework. Results indicate that all traceability metrics are valid and correct. Thus, requirements traceability metrics can be used for assuring quality of DW requirements model.KeywordsAgent goal decision information modelData warehouse qualityRequirements engineeringRequirements traceability metricsTheoretical validationZuse’s framework
Article
One of the important aspects in management and acceleration of processes, operations in databases and data warehouses is ETL processes, the process of extracting, transforming and loading data. These processes without optimizing, a realization data warehouse project is costly, complex, and time-consuming. This paper provides an overview and research of methods for optimizing the performance of ETL processes; that the most important indicator of ETL system's operation is the time and speed of data processing is shown. The issues of the generalized structure of ETL process flows are considered, the architecture of ETL process optimization is proposed, and the main methods of parallel data processing in ETL systems are presented, those methods can improve its performance. The most relevant today of the problem is performance of ETL processes for data warehouses is considered in detail.
Article
This work sheds light on the importance of budget transparency at the national level through a study of the SIGA BRASIL portal. The objective is to understand how the operation of this electronic government helps to cover the expenses of public money resources and how, through it, it is possible to combat and monitor corruption practices in the public sector, in addition to seeking to answer questions related to access on the part of citizens to this system and if it has been bringing satisfactory results within the field of action. In order to achieve this objective, the studies of the following authors were used in the article as reference: Eugenio Raul Zaffaroni (1990), José Vitor Lemes Gomes (2016), Paulo Sérgio Sabino de Araújo (2008), Augustinho Paludo (2016) and Rodrigo Monteira da Silva (2016). The methodology used was descriptive of qualitative nature, where the data collection was done through the indirect document technique and the data analysis followed the document analysis technique. The main results obtained show that the SIGA BRASIL portal is more accessed by a group of people rated as “specialties in public budgeting” than by society in general. This system contributes to the fight against corruption as it issues reports on budget processes, enabling greater regulation and inspection by society and public authorities. Furthermore, this portal is proven to be an important transparency tool and fulfills its role effectively.
ResearchGate has not been able to resolve any references for this publication.