Article

There is no AI without data

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Artificial intelligence (AI) constitutes a game changer across all business sectors. This holds particularly true for industrial enterprises due to the large amounts of data generated across the industrial value chain. However, AI has not delivered on the promises in industry practice, yet. The core business of industrial enterprises is not yet AI-enhanced. In fact, data issues constitute the main reasons for the insufficient adoption of AI. This paper addresses these issues and rests on our practical experiences on the AI enablement of a large industrial enterprise. As a starting point, we characterize the current state of AI in industrial enterprises, which we call “insular AI”. This leads to various data challenges limiting the comprehensive application of AI. We particularly investigate challenges on data management, data democratization and data governance resulting from real-world AI projects. We illustrate these challenges with practical examples and detail related aspects, e.g., metadata management, data architecture and data ownership. To address the challenges, we present the data ecosystem for industrial enterprises. It constitutes a framework of data producers, data platforms, data consumers and data roles for AI and data analytics in industrial environments. We assess how the data ecosystem addresses the individual data challenges and highlight open issues we are facing in course of the enterprise-scale realization of the data ecosystem. Particularly, the design of an enterprise data marketplace as pivotal point of the data ecosystem is a valuable direction of future work.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Local, national and international policymakers have also started introducing various regulations that indirectly or directly affect such AI/ML applications [1], including the "right to be informed" of the EU GDPR and the EU AI Act, leading to further discussions of the risks and appropriate actions for research, development and governance of such applications [7]. Thus, efforts to introduce and apply AI/ML in industrial scenarios [8][9][10][11] must consider the respective concerns. ...
... They also raise concerns related to various levels of automation, echoing the concerns and guidelines made from the perspectives of human-automation collaboration [12,78,79] and visualization [80] perspectives. Gröger provides further insights on the data challenges for AI in industrial applications, while acknowledging the role of various user roles and tools, e.g., for data discovery and exploration as part of the data engineering process [10], while De Silva et al. describe a reference architecture for AI-based intelligent industrial informatics, where visualization plays an important role [8]. ...
... The visualization community could also refer to the voices from the AI/ML community that recognize the need to focus not only on the algorithm/model, but also data and human in order to improve the overall workflows [10,61]; furthermore, the arguments made for human-centered AI and ML are very much relevant here, such as the works by Shneiderman [33], Sacha et al. [34] and Andrienko et al. [35], among others. • Motivation for VA Researchers The visualization community might also have its reservations for collaborations with a perceived low level of visualization research novelty [28,70,100], considering that the role of applied papers and contributions in the community has been debated over time [123,124]. ...
Article
Full-text available
As the levels of automation and reliance on modern artificial intelligence (AI) approaches increase across multiple industries, the importance of the human-centered perspective becomes more evident. Various actors in such industrial applications, including equipment operators and decision makers, have their needs and preferences that often do not align with the decisions produced by black-box models, potentially leading to mistrust and wasted productivity gain opportunities. In this paper, we examine these issues through the lenses of visual analytics and, more broadly, interactive visualization, and we argue that the methods and techniques from these fields can lead to advances in both academic research and industrial innovations concerning the explainability of AI models. To address the existing gap within and across the research and application fields, we propose a conceptual framework for visual analytics design and evaluation for such scenarios, followed by a preliminary roadmap and call to action for the respective communities.
... The efficiency is attempted to be improved by focusing on data markets' organization, trust, interoperability, and sellable data quality in data trade. The prior studies have proposed approaches for centralized and decentralized data markets (Ramachandran et al. 2018;Alvsvåg et al. 2022;Anthony 2023) and for local, federated, and domain-specific data markets (Yerabolu et al. 2019;Fernandez et al. 2020;Gröger 2021;Abbas et al. 2022). There are efforts to build an infrastructure, rules, principles, and standards for achieving trust, interoperability, and data sovereignty in data sharing (Eggers et al. 2020). ...
... The sellable data products can be based on Edge servers' Data and Processing Capabilities (EDPCs) that keep the data at the local level and do not transfer the "raw" or unprocessed data to the cloud (Palviainen and Suksi 2023). There are developed enterprise data marketplaces that contain a metadata-based inventory for EDPCs in edge data lakes to enable the realization of applications based on local data (Gröger 2021). However, these marketplaces focus typically more on matching data supply and demand within the enterprise than providing data for external users (Yerabolu et al. 2019). ...
... Following subsections analyze these four data supply strategies in more detail. • Dynamism in data offerings (Duchbrown et al. 2017;Fernandez et al. 2020) • Flexible data offerings (Fernandez et al. 2020) • Local data markets (Anthony 2023;Palviainen and Suksi 2023;Gröger 2021;Yerabolu et al. 2019) • Dynamism in data pricing (Liang et al. 2018) • Flexible pricing models (Liang et al. 2018) • Federated data markets (Abbas et al. 2022;Eggers et al. 2020 Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Article
Full-text available
The smart city infrastructures, such as digital platforms, edge computing, and fast 5G/6G networks, bring new possibilities to use near-real-time sensor data in digital twins, AR applications, and Machine-to-Machine applications. In addition, AI offers new capabilities for data analytics, data adaptation, event/anomaly detection, and prediction. However, novel data supply and use strategies are needed when going toward higher-granularity data trade, in which a high volume of short-term data products is traded automatically in dynamic environments. This paper presents offering-driven data supply (ODS), demand-driven data supply (DDS), event and offering-driven data supply (EODS), and event and demand-driven data supply (EDDS) strategies for high-granularity data trade. Computer simulation was used as a method to evaluate the use of these strategies in supply of air quality data for four user groups with different requirements for the data quality, freshness, and price. The simulation results were stored as CSV files and analyzed and visualized in Excel. The simulation results and SWOT-analysis of the suggested strategies show that the choice between the strategies is case-specific. DDS increased efficiency in data supply in the simulated scenarios. There was higher profit and revenues and lower costs in DDS than in ODS. However, there are use cases that require the use of ODS, as DDS does not offer ready prepared data for instant use of data. EDDS increased efficiency in data supply in the simulated scenarios. The costs were lower in EODS, but EDDS produced clearly higher revenues and profits.
... There were no self-contained data quality tools, and instead, the data quality work aimed at providing correct data to business processes and allowing for data reuse. Neither data quality nor data architectures followed coordinated approaches, leading to the emergence of data silos [45]. ...
... In this phase, organizations experienced that the previously established centralized data architectures cannot comprehend the surging number of data sources and formats [11,22]. To address these new requirements, decentralized data architectures comprising local data products and offering high scalability have become increasingly popular [12,45]. These data architectures follow the concepts of domain-driven design and incentivize the creation of high-quality data at the source [42]. ...
... Evolution of data quality tools in relation to data management [5] and data architecture developments. standards for interoperability and allow for cooperative approaches to data management [24,45]. ...
Article
Full-text available
Data ecosystems are a novel inter-organizational form of cooperation. They require at least one data provider and one or more data consumers. Existing research mainly addresses generativity mechanisms in this relationship, such as business models or role models for data ecosystems. However, an essential prerequisite for thriving data ecosystems is high data quality in the shared data. Without sufficient data quality, sharing data might lead to negative business consequences, given that the information drawn from them or services built on them might be incorrect or produce fraudulent results. We tackle this issue precisely since we report on a multi-case study deploying data quality tools in data ecosystem scenarios. From these cases, we derive generalized prescriptive design knowledge as a design theory to make the knowledge available for others designing data quality tools for data sharing. Subsequently, our study contributes to integrating the issue of data quality in data ecosystem research and provides practitioners with actionable guidelines inferred from three real-world cases.
... In summary, it can be stated that both types of data platforms show rather contrary properties and target different types of analytical applications. Hence, companies often need to leverage both of them [17]. In our previous work [8], we discuss four basic integration patterns for combining the capabilities of data warehouses and data lakes. ...
... The section "Data Warehouses and Data Lakes" described the limitations of traditional data warehouses, which finally resulted in the emergence of data lakes. However, the development and operation of data lakes poses challenges as well, which are also reflected in the baseline implementation and can be considered representative for many data lake in industrial practice [17]. This section discusses some of these challenges and thereby demonstrates the relevance and motivation for the development of novel architectural approaches for data platforms. ...
... Furthermore, this approach would result in the replication of data and possibly also in less up-to-date models, as the data has first to be extracted, transformed and transmitted before it can be exploited for data mining and machine learning. Therefore, in order to avoid these issues, data scientists should be able to directly access the data on the storage system of the data platform with their preferred tools and libraries, such as MLlib, TensorFlow 16 and scikit-learn 17 , without needing to export the data first. Since technical metadata may contain important information about the structure of the stored data, e.g. in terms of partitions, the locations of data files and the composition of data collections, and may be necessary in order to ensure atomicity and isolation (cf. ...
Article
Full-text available
In the context of data analytics, so-called lakehouses refer to novel variants of data platforms that attempt to combine characteristics of data warehouses and data lakes. In this way, lakehouses promise to simplify enterprise analytics architectures, which often suffer from high operational costs, slow analytical processes and further shortcomings resulting from data replication. However, different views and notions on the lakehouse paradigm exist, which are commonly driven by individual technologies and varying analytical use cases. Therefore, it remains unclear what challenges lakehouses address, how they can be characterized and which technologies can be leveraged to implement them. This paper addresses these issues by providing an extensive overview of concepts and technologies that are related to the lakehouse paradigm and by outlining lakehouses as a distinct architectural approach for data platforms. Concepts and technologies from literature with regard to lakehouses are discussed, based on which a conceptual foundation for lakehouses is established. In addition, several popular technologies are evaluated regarding their suitability for the building of lakehouses. All findings are supported and demonstrated with the help of a representative analytics scenario. Typical challenges of conventional data platforms are identified, a new, sharper definition for lakehouses is proposed and technical requirements for lakehouses are derived. As part of an evaluation, these requirements are applied to several popular technologies, of which frameworks for data lakes turn out to be particularly helpful for the construction of lakehouses. Our work provides an overview of the state of the art and a conceptual foundation for the lakehouse paradigm, which can support future research.
... [DE7] Gröger [16] calls for a data ecosystem for industrial enterprises, see Figure 4. That ecosystem contains a specific role for data engineers and data engineering as part of the data democratization challenge: "making all kinds of data available for AI for all kinds of end users across the entire enterprise". ...
... Paper DE6 by Raj et al. [29] has a rather fuzzy definition of data engineering as a step that "performs two different operations at the high level, which include data collection and data ingestion". Paper DE7 by Gröger [16] defines data engineering as "modelling, integrating and cleansing of data." Paper DE13 by Cheng and Long [10] says "the raw data in each entity is extracted, transformed, and prepared for model training". ...
... A data ecosystem for industrial enterprises[16] ...
Conference Paper
Full-text available
AI systems cannot exist without data. Now that AI models (data science and AI) have matured and are readily available to apply in practice, most organizations struggle with the data infrastructure to do so. There is a growing need for data engineers that know how to prepare data for AI systems or that can setup enterprise-wide data architectures for analytical projects. But until now, the data engineering part of AI engineering has not been getting much attention, in favor of discussing the modeling part. In this paper we aim to change this by perform a mapping study on data engineering for AI systems, i.e., AI data engineering. We found 25 relevant papers between January 2019 and June 2023, explaining AI data engineering activities. We identify which life cycle phases are covered, which technical solutions or architectures are proposed and which lessons learned are presented. We end by an overall discussion of the papers with implications for practitioners and researchers. This paper creates an overview of the body of knowledge on data engineering for AI. This overview is useful for practitioners to identify solutions and best practices as well as for researchers to identify gaps.
... The term data marketplace refers to the platform built to facilitate this exchange. In the company-internal context, the data marketplace is referred to as an Enterprise Data Marketplace [26,55] or an internal data marketplace [19]. In extension of Wells [56] definition, we propose the following: ...
... Lastly, we conduct an experiment based on this prototype (6) evaluating the impact of introducing an Enterprise Data Marketplace in a company. The content of this paper is based on several interdisciplinary works we compiled throughout assorted research projects, combining comprehensive research with practical experience from an industrial perspective [13,15,16,22,23,26]. ...
... The Enterprise Data Marketplace is addressed in only a few research articles. Amongst others, Gröger [26] highlights the need for this specific marketplace type, Fernandez et al. [19] consider them to bring down data silos, and Wells [55] defines and presents the EDMP in a report. Driessen et al. [12] present data marketplace types with problems and solution approaches, one of which is called the generalist and can be established within a single large company and thus encompasses, but is not limited to the EDMP. ...
Article
Full-text available
In this big data era, multitudes of data are generated and collected which contain the potential to gain new insights, e.g., for enhancing business models. To leverage this potential through, e.g., data science and analytics projects, the data must be made available. In this context, data marketplaces are used as platforms to facilitate the exchange and thus, the provisioning of data and data-related services. Data marketplaces are mainly studied for the exchange of data between organizations, i.e., as external data marketplaces. Yet, the data collected within a company also has the potential to provide valuable insights for this same company, for instance to optimize business processes. Studies indicate, however, that a significant amount of data within companies remains unused. In this sense, it is proposed to employ an Enterprise Data Marketplace, a platform to democratize data within a company among its employees. Specifics of the Enterprise Data Marketplace, how it can be implemented or how it makes data available throughout a variety of systems like data lakes has not been investigated in literature so far. Therefore, we present the characteristics and requirements of this kind of marketplace. We also distinguish it from other tools like data catalogs, provide a platform architecture and highlight how it integrates with the company’s system landscape. The presented concepts are demonstrated through an Enterprise Data Marketplace prototype and an experiment reveals that this marketplace significantly improves the data consumer workflows in terms of efficiency and complexity. This paper is based on several interdisciplinary works combining comprehensive research with practical experience from an industrial perspective. We therefore present the Enterprise Data Marketplace as a distinct marketplace type and provide the basis for establishing it within a company.
... For several reasons, a high level of data quality (DQ) is vital for organizations. It is required to secure organizational agility and avoid harmful societal effects of automated decision-making (Gröger, 2021;Marjanovic et al., 2021). Moreover, ensuring correct and high-quality data sets creates trust in data ecosystems and is becoming part of legislation (Geisler et al., 2021). ...
... Despite the paramount importance of DQ, many organizations struggle to provide data of adequate quality to business processes, thus impeding the success of digital transformations (Gröger, 2021;Legner et al., 2020). Most significantly, the context in which DQ tools operate is changing. ...
... Most significantly, the context in which DQ tools operate is changing. The proliferation of big data uncovered a lack of scalability in centralized data management tools and efforts (Gröger, 2021). Consequently, organizations started to decentralize their data architectures (e.g., data mesh), leading to a distribution of the DQ work and DQ tools being used at the source (Dehghani, 2019;Redman, 2020). ...
Conference Paper
Full-text available
Organizations crave to succeed in the ongoing digital transformation, and central to this is the quality of data as a major source for business innovation. Data quality tools promise to increase the quality of data by managing and automating the different tasks of data quality management. However, established tools often lack support for the fundamental changes accompanying an ongoing digital transformation, such as data mesh architectures. In this paper, we propose a software reference architecture for data quality tools that guides organizations in creating state-of-the-art solutions. Our reference architecture is based on the knowledge captured from ten data quality tools described in the scientific literature. For evaluation, we conducted two qualitative focus group discussions using the adapted architecture tradeoff analysis method as a basis. Our findings reveal that the proposed reference architecture is well-suited for creating successful data quality tools and can help organizations assess offerings in the market.
... Data quality has also become an important topic for machine learning practitioners [32], [43] and they are looking to shift their focus from "goodness-of-fit" to focus on "goodness-of-data" [44]. More broadly than just data quality, industry leaders are calling for a shift towards a data ecosystem for industrial enterprises that will unite data producers' data sources with consumers through data application platforms [45]. This last approach may offer some promise to address one of the causes identified in our study i.e. the lack of focus on data by those developing source systems. ...
... However, a recent "practical investigation" of data governance structures in industrial enterprises highlighted the rudimentary implementation of organisational structures to support data governance [45]. Often the data owners are the same as the source system owner, leading to multiple approvals required when multiple data sources are combined in a solution. ...
... There is emerging research about tools and techniques in the area of metadata management and data catalogues [45]. Understanding data in data-intensive solution development requires a deep understanding of the target questions to be answered by the solution and also the source data domain and context of data collection. ...
Article
Full-text available
The predicted increase in demand for data-intensive solution development is driving the need for software, data, and domain experts to effectively collaborate in multi-disciplinary data-intensive software teams (MDSTs). We conducted a socio-technical grounded theory study through interviews with 24 practitioners in MDSTs to better understand the challenges these teams face when delivering data-intensive software solutions. The interviews provided perspectives across different types of roles including domain, data and software experts, and covered different organisational levels from team members, team managers to executive leaders. We found that the key concern for these teams is dealing with data-related challenges. In this paper, we present a theory of dealing with data challenges that explains the challenges faced by MDSTs including gaining access to data, aligning data, understanding data, and resolving data quality issues; the context in and condition under which these challenges occur, the causes that lead to the challenges, and the related consequences such as having to conduct remediation activities, inability to achieve expected outcomes and lack of trust in the delivered solutions. We also identified contingencies or strategies applied to address the challenges including high-level strategic approaches such as implementing data governance, implementing new tools and techniques such as data quality visualisation and monitoring tools, as well as building stronger teams by focusing on people dynamics, communication skill development and cross-skilling. Our findings have direct implications for practitioners and researchers to better understand the landscape of data challenges and how to deal with them.
... To implement and support these activities, data catalogs (DCs) play an important role as they (a) empower users to work with data; (b) make data-related issues visible; (c) reduce data preparation time and (d) promote compliant data handling and usage [1,8,28]. For a holistic metadata management approach, DCs need to be integrated into the existing enterprise data ecosystem [14]. This includes the integration with upstream data sources, downstream analytics applications, and further tools for data curation as part of a metadata management landscape. ...
... As the spectrum of DC applications and their capabilities remain undefined, it is demanding for practitioners to select the right tools to build such a tool landscape [8]. Second, the successful technical integration of DCs depends on several factors including automatic data source integration, DC federation, and data access provisioning [14]. Yet, lacking clarity about these characteristics hampers the usage of DCs as fundamental components of metadata management landscapes. ...
... While the general positioning and role of DCs in an integrated metadata management landscape is clear, organizations continue to face challenges in building concrete implementations. These challenges include mapping capabilities to the variety of metadata management tools, or integrating those tools with each other and with existing data sources [9,14]. Accordingly, there is still a need for greater clarity about the types and core properties of DC applications in the enterprise sphere. ...
Article
Full-text available
Despite investing heavily in data-related technology and human resources, enterprises are still struggling to derive value from data. To foster data value creation and move toward a data-driven enterprise, adequate data management and data governance practices are fundamental. To support these practices, organizations are building (meta)data management landscapes by combining different tools. Data catalogs are a central part of these landscapes as they enable an overview of available data assets and their characteristics. To deliver their highest value, data catalogs need to be integrated with existing data sources and other data management tools. However, enterprises struggle with data catalog integration because (a) not all data catalog application types foster enterprise-wide data management and data governance alike, and (b) several technical characteristics of data catalog integration remain unclear. These include the supported data sources, data catalog federation, and ways to provision data access. To tackle these challenges, this paper first develops a typology of data catalog applications in the enterprise context. Based on a review of the academic literature and an analysis of data catalog offerings, it identifies four enterprise-internal and three cross-enterprise classes of data catalog applications. Second, an in-depth analysis of 51 data catalog offerings that foster enterprise-wide metadata management examines key characteristics of the technical integration of data catalogs.
... Therefore, data democratization initiatives with the goal of empowering and motivating employees to find, understand, access, use and share data across the company [2], are gaining importance. To drive democratization aspects such as data sharing across the company, the use of enterprise data marketplaces has been proposed [3]. In general, data marketplaces are metadata-driven self-service platforms for trading data and data related services [3,4]. ...
... To drive democratization aspects such as data sharing across the company, the use of enterprise data marketplaces has been proposed [3]. In general, data marketplaces are metadata-driven self-service platforms for trading data and data related services [3,4]. Enterprise data marketplaces are specifically designed to facilitate the exchange of data and data related services within a compa-ny [5]. ...
... In order to identify the data provider's assignments and associated processes within an enterprise, we conducted a literature study including [3,[9][10][11][12]]. Yet, we found that many articles focus on the consumer perspective as opposed to the provider perspective or only describe very abstract insight into the provider's processes. ...
Conference Paper
In the big data era companies have an increasing volume of data at their disposal. To enable the democratization of this data so it can be found, understood and accessed by the majority of employees, so-called data providers must first publish the data and provide provisioning options. However, a lack of incentives and increased effort for the data providers to share their data hinders the democratization of data. In this work, we present the current state and challenges of a data provider’s journey, derived from a literature study as well as expert interviews we conducted with a globally active manufacturer. To address these challenges, we propose the use of an enterprise data marketplace, a platform for sharing data within the company. By presenting a functionality framework for such a marketplace and by highlighting how it can integrate with a company’s data catalog, we outline how a marketplace can support the data provider. We implemented a prototype of an enterprise data marketplace and determined the feasibility of three scenarios to relieve the data provider. Finally, an assessment based on the prototype yields that the data marketplace supports the provider throughout the provider’s journey, addresses major challenges, and thus, contributes to the overall goal of data democratization within enterprises.
... Tab. 2). Integration of AI systems [8,10,16] [inner, outer] Deployment [3,4] [single, continuous] Project trigger [3,13] [data, problem, solution] Integration [6,10,16] [greenf ield, brownf ield] Team size [33] [< 9, >= 9] Share of AI [6,13] [low, medium, high] Interaction [2] [linear, non − linear] Data sensitivity [22] [low, high] Data availability [7,13] [none, insuf f icient, suf f icient] ...
... But without data, no AI. Availability of sufficient data is a crucial point in AI projects driving design decisions [7]. For instance, in case of insuf f icient data, available data lacks relevant features, so additional data sources have to be explored and acquired (data crawling). ...
... In general, data warehouses and data lakes focus on data management, whereas data lakehouses, data mesh, and data fabric broadly refer to data architectures [15]. Data management systems and architectures differ in the level of abstraction. ...
... The third challenge, as reported by interviewees, relates to the data product model. Data products store various data types and also offer metadata, aiding users without domain knowledge in data interpretation [3], [15]. However, interviewees point out a gap between the descriptions provider domains deliver and the information consumer domains need to correctly interpret the data. ...
Article
Full-text available
With the increasing importance of data and artificial intelligence, organizations strive to become more data-driven. However, current data architectures are not necessarily designed to keep up with the scale and scope of data and analytics use cases. In fact, existing architectures often fail to deliver the promised value associated with them. Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management. As the concept of data mesh is still novel, it lacks empirical insights from the field. Specifically, an understanding of the motivational factors for introducing data mesh, the associated challenges, implementation strategies, its business impact, and potential archetypes is missing. To address this gap, we conduct 15 semi-structured interviews with industry experts. Our results show, among other insights, that organizations have difficulties with the transition toward federated data governance associated with the data mesh concept, the shift of responsibility for the development, provision, and maintenance of data products, and the comprehension of the overall concept. In our work, we derive multiple implementation strategies and suggest organizations introduce a cross-domain steering unit, observe the data product usage, create quick wins in the early phases, and favor small dedicated teams that prioritize data products. Whereas we acknowledge that organizations need to apply implementation strategies according to their individual needs, we also deduct two archetypes that provide suggestions in more detail. Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
... Data for these use cases are collected from a variety of source systems and often analyzed in specifically developed data analytics solutions. This results in a heterogeneous data management and analytics landscape built on a wide variety of technologies [4], owned by various teams across business units: from enterprise resource planning (ERP) systems under the governance of source system teams to on-premise and cloud-based data management platforms run by a central IT department to various data analytics tools. Figure 1 depicts the interactions in such a heterogeneous landscape, based on a practical implementation at a globally active manufacturer. ...
... Especially cloud providers tend to offer a wide variety of tools and services for data exchange, such as pipelining, and visualization tools, which increases the complexity of setting up such a data pipeline. In fact, delivering data to consumers poses a major challenges to leveraging data's value [4]. Coordination between different parties (source system owners, central IT providers of e.g., the data lake, and business unit IT) is complex and time consuming, and much time is spent on developing fitting interfaces. ...
Article
Full-text available
Through data analytics, enterprises can exploit the value their data hold. However, there are still various challenges to be solved, one of them being how to consume data in heterogeneous data management landscapes. To address this challenge, we developed a systematic, hierarchical approach to data consumption, including six data consumption patterns, which is presented in this paper. Each of the six patterns can be associated with multiple implementation patterns that detail its technical realization. We report the application of these patterns in a real-world practical scenario and discuss the benefits of applying the data consumption patterns.
... Consequently, analytics results have not been effectively deployed in most manufacturing systems (Gröger, 2021). Platform approaches are promising for overcoming these issues. ...
... These two user groups are data experts. A dedicated environment is required because a heterogeneous landscape of data sources and data management techniques requires specialist knowledge (Beecks et al., 2018;Gröger, 2021). For data management, a single source of truth shall be constituted by storing and providing all relevant dataas suggested by e.g., (Schuh et al., 2020)). ...
Article
Full-text available
Digital transformation is driving the current technological trends in manufacturing. An integral constituent is a communication between machines, between machines and humans, or between machines and products. This extensive communication involves large volumes of data. Many manufacturers apply data analytics (e.g., for quality management or improvement purposes) to translate the data into a business value. However, isolated, rigid, and area-specific IT solutions often carry this out. Today’s complex manufacturing requires quality management approaches that constitute a holistic view of and understanding of process–product interactions along the process chain instead of focusing solely on single processes. A novel platform approach to support quality management in manufacturing systems is proposed in this paper to overcome this deficit. It integrates state-of-the-art concepts of IT with modeling approaches for planning and operation of quality management. A conceptual framework and the technical architecture for implementing a digitalization platform are presented in this regard. Moreover, the approach is validated and implemented within a web application based on a use case of data-driven quality management in electronics production.
... The potential for AI has been enhanced by the recent and future enormous growth of data. However, this precious data raises tedious challenges, such as data quality assessment, and, according to [18] data management, data democratization and data provenance. Until recently, both academia and industry were mainly engaged in improving or introducing new and improving existing ML-models, rather than finding remedies for any data challenges that fall beyond trivial cleaning or preparation steps. ...
... In the field of data management, data quality is a well-studied topic that has been a major concern of organizations for decades, leading to the introduction of standards and quality frameworks Batini and Scannapieco [4], Wang and Strong [53]. The recent advances in AI have brought data quality back into the spotlight in the context of building "data ecosystems" that cope with emerging data challenges posed by AI-based systems in enterprises [18]. Researchers pointed out such challenges, including data quality issues [20], data life cycle concerns [37], the connection to ML-OPs [39], and model management [43]. ...
Preprint
Full-text available
Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example, incomplete, erroneous or inappropriate training data can lead to unreliable models that produce ultimately poor decisions. Trustworthy AI applications require high-quality training and test data along many dimensions, such as accuracy, completeness, consistency, and uniformity. We explore empirically the correlation between six of the traditional data quality dimensions and the performance of fifteen widely used ML algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining ML results in terms of data quality. Our experiments distinguish three scenarios based on the AI pipeline steps that were fed with polluted data: polluted training data, test data, or both. We conclude the paper with an extensive discussion of our observations and recommendations, alongside open questions and future directions to be explored.
... The potential for AI has been enhanced by the recent and future enormous growth of data. However, this precious data raises tedious challenges, such as data quality assessment, and, according to [1] data management, data democratization and data provenance. ...
... They act as centralized repositories, making it easier for data consumers to discover, understand, and access the information they need [26]. Enterprise data management platforms often comprise such centralized data catalogs implying storage of data within their peripheries [25,27]. If data are not encapsulated within the organization but integrated into decentral or federated networks [28], the literature commonly refers to such environments as data ecosystems with metadata catalogs as key function [5,14]. ...
Article
Full-text available
Background: In the European health care industry, recent years have seen increasing investments in data ecosystems to “FAIRify” and capitalize the ever-rising amount of health data. Within such networks, health metadata catalogs (HMDCs) assume a key function as they enable data allocation, sharing, and use practices. By design, HMDCs orchestrate health information for the purpose of findability, accessibility, interoperability, and reusability (FAIR). However, despite various European initiatives pushing health care data ecosystems forward, actionable design knowledge about HMDCs is scarce. This impedes both their effective development in practice and their scientific exploration, causing huge unused innovation potential of health data. Objective: This study aims to explore the structural design elements of HMDCs, classifying them alongside empirically reasonable dimensions and characteristics. In doing so, the development of HMDCs in practice is facilitated while also closing a crucial gap in theory (ie, the literature about actionable HMDC design knowledge). Methods: We applied a rigorous methodology for taxonomy building following well-known and established guidelines from the domain of information systems. Within this methodological framework, inductive and deductive research methods were applied to iteratively design and evaluate the evolving set of HMDC dimensions and characteristics. Specifically, a systematic literature review was conducted to identify and analyze 38 articles, while a multicase study was conducted to examine 17 HMDCs from practice. These findings were evaluated and refined in 2 extensive focus group sessions by 7 interdisciplinary experts with deep knowledge about HMDCs. Results: The artifact generated by the study is an iteratively conceptualized and empirically grounded taxonomy with elaborate explanations. It proposes 20 dimensions encompassing 101 characteristics alongside which FAIR HMDCs can be structured and classified. The taxonomy describes basic design characteristics that need to be considered to implement FAIR HMDCs effectively. A major finding was that a particular focus in developing HMDCs is on the design of their published dataset offerings (ie, their metadata assets) as well as on data security and governance. The taxonomy is evaluated against the background of 4 use cases, which were cocreated with experts. These illustrative scenarios add depth and context to the taxonomy as they underline its relevance and applicability in real-world settings. Conclusions: The findings contribute fundamental, yet actionable, design knowledge for building HMDCs in European health care data ecosystems. They provide guidance for health care practitioners, while allowing both scientists and policy makers to navigate through this evolving research field and anchor their work. Therefore, this study closes the research gap outlined earlier, which has prevailed in theory and practice.
... At the core of a data ecosystem, data platforms provide the technical infrastructure for processing and managing data from diverse sources, enabling various data applications. These platforms often incorporate data marketplaces, which serve as self-service platforms that connect data producers and consumers (Gröger, 2021). Another closely related concept is data spaces, which are frequently used to describe data-sharing ecosystems across organizations and thus will be used as synonyms in this paper (Otto et al., 2019). ...
Article
Full-text available
Given the critical role of data availability for growth and innovation in financial services, especially small and mid-sized banks lack the data volumes required to fully leverage AI advancements for enhancing fraud detection, operational efficiency , and risk management. With existing solutions facing challenges in scalability, inconsistent standards, and complex privacy regulations, we introduce a synthetic data sharing ecosystem (SynDEc) using generative AI. Employing design science research in collaboration with two banks, among them UnionBank of the Philippines, we developed and validated a synthetic data sharing ecosystem for financial institutions. The derived design principles highlight synthetic data setup, training configurations, and incentivization. Furthermore, our findings show that smaller banks benefit most from SynDEcs and our solution is viable even with limited participation. Thus, we advance data ecosystem design knowledge, show its viability for financial services, and offer practical guidance for privacy-resilient synthetic data sharing, laying groundwork for future applications of SynDEcs.
... Modern industrial organizations need to maintain and enhance their competitive advantage by becoming more data driven. Due to the myriads of information systems that industrial organizations use to manage their daily operations industrial business systems are producing massive amounts of data, knowledge, and information [2]. Industries need to process these data to identify patterns and trends that lead to make them more predictive. ...
... Recent research has advanced our understanding of data sharing issues and data governance practices (Lis & Otto, 2020;Costabile & Ovrelid, 2023). Knowledge about data quality issues in data ecosystems is still limited, with most research focusing on interorganizational ecosystems and little attention to large companies as independent ecosystems (Gröger, 2021). While Jussen et al. (2024) address general issues, there is a clear gap in understanding specific data quality issues. ...
Conference Paper
Full-text available
This paper explores challenges in intra-organizational data ecosystems, with a focus on data quality and its impact on organizations. Data quality is essential for optimizing data integration and interoperability within large companies, which often function as independent ecosystems. However, data sharing is frequently hindered by various issues. Through expert interviews with two distinct business units at a leading global tech firm, the study identifies six data issues with technical incompatibilities as the main cause of data quality problems. It reveals that independent technical decisions by one actor can significantly affect others, highlighting the need to balance individual requirements with overall data quality improvement.
... During the past decades, the major focus of both researchers and practitioners in this area has been on model-centric AI, i.e., trying to improve the performance by choosing the most appropriate architecture, learning algorithm, and hyperparameters, while typically relying on standard training data sets, benchmark validation and test data sets. In contrast, during the past several years, the AI community has started to openly acknowledge the challenges and opportunities associated with datacentric (Gröger, 2021;Sambasivan et al., 2021), but also human-centric AI (Jarrahi et al., 2023). Some of the respective challenges and principles here include the systematic improvement of data fit and data consistency for the domain application at hand (while acknowledging the human-centeredness of such data-related activities), mutual improvement of model and data through iteration, and interaction between domain experts and AI as a sociotechnical system (Jarrahi et al., 2023). ...
... All these available data sources must be organised and structured in an appropriate way so that they can be automatically managed, integrated, and processed, providing flexibility in the incorporation of updates in the sources, as well as enabling traceability of the results to enrich the analysis. In this way, instead of basing the analysis process on independent tools for each specific task in the pipeline (from data collection to the final building reuse decisions), an end-to-end solution is required to overcome the limitations of current practices [29], providing a single data analysis framework spanning the whole life cycle. ...
Article
Full-text available
This article introduces a methodology for a novel data-driven computational model aimed at aiding public administrations in managing and evaluating the adaptative reuse of buildings while tackling ecological and digital challenges. Drawing from the 2030 Agenda for Sustainable Development, the study underscores the significance of innovative approaches in harnessing the economic potential of data. Focusing on Barcelona’s Ciutat Vella district, the research selects five historic public buildings for analysis, strategically positioned to spur local entrepreneurship and counteract tourism dominance. Through an extensive literature review, the article identifies a gap in computational models for building adaptative reuse and proposes a methodological framework that integrates data collection, processing, and computational modelling, underscored by GIS technology and open data sources. The proposed methodology for a computational algorithm aims to systematise spatial characteristics, assess programmatic needs, and optimise building usage, while addressing challenges such as data integration and quality assurance. Ultimately, the research presents a pioneering approach to building adaptative reuse, aimed at fostering sustainable urban development and offering replicable insights applicable to similar challenges in other cities.
... The quality of the training set plays a central role in machine learning and statistical modeling applications [39]. Clear, consistent, and informative data improve model accuracy in prediction and pattern recognition. ...
Article
Full-text available
Assuming climatic homogeneity is no longer acceptable in greenhouse farming since it can result in less-than-ideal agronomic decisions. Indeed, several approaches have been proposed based on installing sensors in predefined points of interest (PoIs) to obtain a better mapping of climatic conditions. However, these approaches suffer from two main problems, i.e., identifying the most significant PoIs inside the greenhouse and placing a sensor at each PoI, which may be costly and incompatible with field operations. As regards the first problem, we propose a genetic algorithm to identify the best sensing places based on the agronomic definition of zones of interest. As regards the second problem, we exploit agricultural robots to collect climatic information to train a set of virtual sensors based on recurrent neural networks. The proposed solution has been tested on a real-world dataset regarding a greenhouse in Verona (Italy).
... Similarly, there is an ethical question of whether it is appropriate to feed raw audio and video data, which is typically not de-identified, into automated transcription programs as this may not be explicitly addressed in participant consent documents outlining how their data will be protected. AI functions through the processing of large amounts of data (Gröger, 2021), and therefore it is a meaningful consideration about how appropriate it is for raw interview responses that participants expect to be protected to be fed into a system that will actively use it for future transcriptions and other AI-generated products and resources (e.g., Head et al., 2023). The ethical question persists of how participants expect their data (audio and written) to be used and protected. ...
Article
Artificial Intelligence (AI) and other large language models are rapidly infiltrating the world of education and educational research. These new technological developments raise questions about use and ethics throughout the world of educational research, particularly for qualitative methods given the philosophical and structural foundations of its associated designs. This paper seeks to interrogate the perceived ethics around the use of AI in qualitative research and draws on survey data from qualitative researchers ( n = 101) collected from April-May 2023. Findings indicate that researchers were more apt to embrace the use of AI for transcription purposes, and to a lesser extent for preliminary coding. Researchers from high research productivity (R1) universities were generally less accepting of AI's use in the research process than other researchers.
... É um campo versátil aplicável a diversas indústrias, desde finanças até saúde. Inclusive, é difícil pensar que a Inteligência Artificial consiga prosperar sem uma boa gerência de dados [18]. Ao mesmo tempo, fica difícil imaginar o futuro da Ciência de Dados sem caminhar lado a lado com a Inteligência Artificial. ...
Conference Paper
Este documento apresenta os referenciais de formação na área de Computação para os cursos de Bacharelado em Ciência da Dados (RF-CD-21). Estes Referenciais foram construídos em torno da noção de competência, em consonância com as competências definidas pela Força Tarefa em Ciência de Dados da Association for Computing Machinery (ACM) em 2021 (ACM Data Science Task Force (2021). Assim como feito pela SBC na preparação de um Currículo de Referência para outras áreas da Computação, as 17 (dezessete) competências apontadas como necessárias estão resumidas em 8 (oito) eixos de formação, de forma a facilitar a construção de currículos nas Instituições de Ensino Superior (IES) brasileiras. Cada eixo de formação relaciona os conteúdos considerados úteis no desenvolvimento das competências necessárias. Por fim, este referencial busca nortear a construção de um Projeto Pedagógico de Curso (PPC) para cursos de graduação em Ciência de Dados pelas IES, proporcionando flexibilidade para que cada uma delas defina seus PPC conforme sua vocação e seus objetivos.
... By analysing various aspects of integrating AI technologies into data engineering of this process, successful strategies and solutions have been identified that affect data security, quality and integrity, standardisation, optimisation of big data processing, and other key aspects (Talib et al., 2021;Gröger, 2021;Ebid, 2021). Table 1 summarises the main challenges of AI integration and their solutions. ...
Article
The integration of artificial intelligence technologies into data engineering gained significant relevancy in the context of constantly growing volumes and complexity of data, which requires innovative approaches to processing and analysis. The goal of the present study is to conduct a deep analysis of the implementation of artificial intelligence into data engineering with a focus on the challenges occurring and perspectives of this process. Research methods, such as analysis methods, comparison, systematisation, and systemic approach, were used for an objective study of this phenomenon and revealing key aspects of this topic. Analysis revealed key challenges, that include variety and instability of data, the importance of standardisation as well as ensuring security of big data amounts. The importance of ethical aspects is underlined, and perspectives on automation of analytical processes and improving prognostic analysis were also determined. According to the results, employment of common standards improves the consistency of approaches, whereas improved algorithms accelerate the processing of big data amounts. Employment of such technology as Apache Hadoop and Spark for processing big data amounts and step-by-step introduction of artificial intelligence is also useful. Increased decision explication also improves their understanding, simplifying interaction between experts and interested parties, and simultaneously creating conditions for effective implementation and employment of integrated artificial intelligence systems in data engineering. The compilation of ethical standards and legal mechanisms creates an opportunity for responsible and balanced employment of these technologies, ensuring trust and ethical compliance in the process of their implementation into various spheres of human activity. These results determine perspectives for the development of this sphere and highlight its importance in a modern informationbased society. Integration of artificial intelligence into data engineering expands capabilities of automating analytical processes, ensuring accurate predictions, and reducing manual labour expenses, creating opportunities for effective management and reasoned decision-making in the data processing sphere
... Data is a crucial element of AI, and Gröger claims that there will be no AI without data [30]. Data, however, also introduced a number of challenges for the public sector organizations, including navigating the European General Data Protection Regulation (GDPR) and Norwegian juridical frameworks as well as known and unknown bias in data [31,32]. ...
Chapter
Full-text available
This paper presents a study of the use of artificial intelligence (AI) in the Norwegian public sector. The study focused particularly on projects involving personal data, which adds a risk of discriminating against individuals and social groups. The study included a survey of 200 public sector organizations and 19 interviews with representatives for AI projects involving personal data. The findings suggest that AI development in the public sector is still immature, and few projects involving personal data have reached the stage of production. Political pressure to use AI in the sector is significant. Limited knowledge and focus on AI development among managements has made individuals and units with the resources and interest in experimenting with AI an important driving force. The study found that the journey from idea to production of AI in the public sector presents many challenges, which often leads to projects being temporarily halted or terminated. While AI can contribute to the streamlining and improvement of public services, it also involves risks and challenges, including the risk of producing incorrect or discriminatory results affecting individuals and groups when personal data is involved. The risk of discrimination was, however, not a significant concern in the public sector AI projects. Instead, other concepts such as ethics, fairness, and transparency took precedence in most of the project surveyed here.
... In particular, researchers and practitioners recognize the need for more systematic data work as a means to improve the data used to train ML models. In fact, data is a crucial lever for an ML model to generate knowledge (Gröger 2021). Consequently, data quantity (e.g., the number of instances) and data quality (e.g., data relevance and label quality) largely influence the performance of AI-based systems (Gudivada et al. 2017). ...
Article
Full-text available
Data-centric artificial intelligence (data-centric AI) represents an emerging paradigm that emphasizes the importance of enhancing data systematically and at scale to build effective and efficient AI-based systems. The novel paradigm complements recent model-centric AI, which focuses on improving the performance of AI-based systems based on changes in the model using a fixed set of data. The objective of this article is to introduce practitioners and researchers from the field of Business and Information Systems Engineering (BISE) to data-centric AI. The paper defines relevant terms, provides key characteristics to contrast the paradigm of data-centric AI with the model-centric one, and introduces a framework to illustrate the different dimensions of data-centric AI. In addition, an overview of available tools for data-centric AI is presented and this novel paradigm is differenciated from related concepts. Finally, the paper discusses the longer-term implications of data-centric AI for the BISE community.
... This can allow researchers to dedicate their attention to the research process's more creative and strategic aspects. In this regard, Gröger (2021) proposes the "data ecosystem for industrial enterprises, a framework of data producers, data platforms, data consumers, and data roles for AI and data analytics in industrial environments." (p. ...
Article
Full-text available
Qualitative researchers can benefit from using generative artificial intelligence (GenAI), such as different versions of ChatGPT—GPT-3.5 or GPT-4, Google Bard—now renamed as a Gemini, and Bing Chat—now renamed as a Copilot, in their studies. The scientific community has used artificial intelligence (AI) tools in various ways. However, using GenAI has generated concerns regarding potential research unreliability, bias, and unethical outcomes in GenAI-generated research results. Considering these concerns, the purpose of this commentary is to review the current use of GenAI in qualitative research, including its strengths, limitations, and ethical dilemmas from the perspective of critical appraisal from South Asia, Nepal. I explore the controversy surrounding the proper acknowledgment of GenAI or AI use in qualitative studies and how GenAI can support or challenge qualitative studies. First, I discuss what qualitative researchers need to know about GenAI in their research. Second, I examine how GenAI can be a valuable tool in qualitative research as a co-author, a conversational platform, and a research assistant for enhancing and hindering qualitative studies. Third, I address the ethical issues of using GenAI in qualitative studies. Fourth, I share my perspectives on the future of GenAI in qualitative research. I would like to recognize and record the utilization of GenAI and/or AI alongside my cognitive and evaluative abilities in constructing this critical appraisal. I offer ethical guidance on when and how to appropriately recognize the use of GenAI in qualitative studies. Finally, I offer some remarks on the implications of using GenAI in qualitative studies.
... Although BI tools are primarily descriptive in summarizing historical and present data evolution, data analytics modules can also become part of modern BI systems, enhancing them with statistical tools as well as artificial intelligence and machine learning capabilities. These tools enable deeper business insights producing business information ranging from descriptive to predictive, prescriptive, and self-explanatory (Gröger 2021). ...
Article
Full-text available
Social media platforms have become a new source of useful information for companies. Ensuring the business value of social media first requires an analysis of the quality of the relevant data and then the development of practical business intelligence solutions. This paper aims at building high-quality datasets for social business intelligence (SoBI). The proposed method offers an integrated and dynamic approach to identify the relevant quality metrics for each analysis domain. This method employs a novel multidimensional data model for the construction of cubes with impact measures for various quality metrics. In this model, quality metrics and indicators are organized in two main axes. The first one concerns the kind of facts to be extracted, namely: posts, users, and topics. The second axis refers to the quality perspectives to be assessed, namely: credibility, reputation, usefulness, and completeness. Additionally, quality cubes include a user-role dimension so that quality metrics can be evaluated in terms of the user business roles. To demonstrate the usefulness of this approach, the authors have applied their method to two separate domains: automotive business and natural disasters management. Results show that the trade-off between quantity and quality for social media data is focused on a small percentage of relevant users. Thus, data filtering can be easily performed by simply ranking the posts according to the quality metrics identified with the proposed method. As far as the authors know, this is the first approach that integrates both the extraction of analytical facts and the assessment of social media data quality in the same framework.
... The fifth design principle relates to timely updates of the DQS to create trust and avoid outdated information. Frequently changing data sets (e.g., social media) and a proliferation of digital devices and sensors, producing continuous data streams, raise challenges for scalability and system integration (Gröger, 2021). Both developments call for DQS that can be frequently updated. ...
Conference Paper
Full-text available
Data products are a hot topic within companies since more and more organizations are implementing data meshes and data product management. Product management is closely related to product quality. The concept of data quality is long established, and there are several notions to measure it. However, these approaches are less practical when data is shared between different domains and organizations with undefined contexts. Such as physical products, data products need a clear definition of what is inside and of what quality they are. With this article, we summarize existing approaches to data quality regarding data products, discuss how a lack of information provision restrains efficient data markets, and provide prescriptive knowledge. To do so, we identify meta-requirements from literature and derive design principles for data scoring systems that cope with information asymmetry for data products to enable efficient data markets.
... However, a data marketplace is not a traditional warehouse, but because of the intangible nature of data, rather, a storefront for the available data products. Customers can select the data they want from a data catalog and the marketplace then acts as an interface to the respective data store [75]. From a data provider perspective, one of the most important functionalities that a data marketplace has to offer for this purpose is a comprehensive metadata management system that allows them to describe their data. ...
Article
Full-text available
Currently, data are often referred to as the oil of the 21st century. This comparison is not only used to express that the resource data are just as important for the fourth industrial revolution as oil was for the technological revolution in the late 19th century. There are also further similarities between these two valuable resources in terms of their handling. Both must first be discovered and extracted from their sources. Then, the raw materials must be cleaned, preprocessed, and stored before they can finally be delivered to consumers. Despite these undeniable similarities, however, there are significant differences between oil and data in all of these processing steps, making data a resource that is considerably more challenging to handle. For instance, data sources, as well as the data themselves, are heterogeneous, which means there is no one-size-fits-all data acquisition solution. Furthermore, data can be distorted by the source or by third parties without being noticed, which affects both quality and usability. Unlike oil, there is also no uniform refinement process for data, as data preparation should be tailored to the subsequent consumers and their intended use cases. With regard to storage, it has to be taken into account that data are not consumed when they are processed or delivered to consumers, which means that the data volume that has to be managed is constantly growing. Finally, data may be subject to special constraints in terms of distribution, which may entail individual delivery plans depending on the customer and their intended purposes. Overall, it can be concluded that innovative approaches are needed for handling the resource data that address these inherent challenges. In this paper, we therefore study and discuss the relevant characteristics of data making them such a challenging resource to handle. In order to enable appropriate data provisioning, we introduce a holistic research concept from data source to data sink that respects the processing requirements of data producers as well as the quality requirements of data consumers and, moreover, ensures a trustworthy data administration.
... On the other hand, our results showed that the DT journey generates digital innovation that creates opportunities and advantages for the SMEs all while generating huge amounts of data. These advances will require new innovations to deploy artificial intelligence benefit from the data available, pushing businesses to invest in skilled people and new organizational capabilities to absorb this new game-changer across many industries (Gröger, 2021). By understanding this dynamic, we recommend that practitioners and managers embark on a continuous innovation mode. ...
Conference Paper
Full-text available
The health crisis of the last few years has affected many enterprises, especially SMEs. Difficult access to markets, supply issues and labor problems have characterized their business environment. Despite these challenges, some SMEs decided to stand out by innovating and investing in digital technologies to develop a new way of doing business during this period. However, little is known about the benefits that these initiatives have had in creating value for these SMEs. To answer this question, we studied two SMEs that have successfully developed new products and simultaneously implemented technological initiatives to significantly improve their processes and counter the challenges related to the pandemic. Inspired by the "SA4" integrated framework of digital transformation, we were able to demonstrate that the numerous benefits that emerged from these innovations contributed to creating value for these SMEs in a crisis context.
Chapter
Full-text available
Artificial intelligence (AI) has become a revolutionary force in the rapidly changing world of modern business, presenting previously unheard of prospects for creativity, effectiveness, and competitiveness. A company must cultivate an enterprise-wide AI culture to integrate AI successfully, which goes beyond adopting new technologies. This chapter provides an in-depth discussion of the crucial task of fostering an organisation's ubiquitous AI culture. This chapter starts by defining the essential idea of AI culture and emphasising its relevance in determining the strategic course of businesses in the digital era. It investigates the advantages of establishing an AI culture, from improved decision.
Article
The assumption of climate homogeneity is no longer acceptable in greenhouse farming since it can result in less-than-ideal decisions. At the same time, installing a sensor in each area of interest is costly and unsuitable for field operations. In this article, we address this problem by putting forth the idea of virtual sensors; their behavior is modeled by a context-aware recurrent neural network trained through the contextual relationships between a small set of permanent monitoring stations and a set of temporary sensors placed in specific points of interest for a short period. More precisely, we consider not only space location but also temporal features and distance with respect to the permanent sensors. This article shows the complete pipeline to configure the recurrent neural network, perform training, and deploy the resulting model into an embedded system for on-site application execution.
Article
Advances in biomedical data science and artificial intelligence (AI) are profoundly changing the landscape of healthcare. This article reviews the ethical issues that arise with the development of AI technologies, including threats to privacy, data security, consent, and justice, as they relate to donors of tissue and data. It also considers broader societal obligations, including the importance of assessing the unintended consequences of AI research in biomedicine. In addition, this article highlights the challenge of rapid AI development against the backdrop of disparate regulatory frameworks, calling for a global approach to address concerns around data misuse, unintended surveillance, and the equitable distribution of AI's benefits and burdens. Finally, a number of potential solutions to these ethical quandaries are offered. Namely, the merits of advocating for a collaborative, informed, and flexible regulatory approach that balances innovation with individual rights and public welfare, fostering a trustworthy AI-driven healthcare ecosystem, are discussed.
Chapter
The Covid-19 pandemic of the last few years has affected many enterprises, especially SMEs. Difficult market access, supply issues, and labor problems have characterized their business environment. Despite these challenges, some SMEs decided to stand out by innovating and investing in digital technologies to develop a new way of doing business during this period. However, little is known about the benefits that these initiatives have had in creating value for these SMEs. To answer this question, we studied two SMEs that have successfully developed new products and simultaneously implemented technological initiatives to significantly improve their processes and counter the challenges related to the pandemic. Inspired by the “S^4” integrated digital transformation framework, we demonstrated that the numerous benefits that emerged from these innovations contributed to creating value for these SMEs in a COVID-19 context.
Article
Full-text available
With recent advances in artificial intelligence (AI), machine learning (ML) has been identified as particularly useful for organizations seeking to create value from data. However, as ML is commonly associated with technical professions, such as computer science and engineering, incorporating training in the use of ML into non-technical educational programs, such as social sciences courses, is challenging. Here, we present an approach to address this challenge by using no-code AI in a course for university students with diverse educational backgrounds. This approach was tested in an empirical, case-based educational setting, in which students engaged in data collection and trained ML models using a no-code AI platform. In addition, a framework consisting of five principles of instruction (problem-centered learning, activation, demonstration, application, and integration) was applied. This paper contributes to the literature on IS education by providing information for instructors on how to incorporate no-code AI in their courses and insights into the benefits and challenges of using no-code AI tools to support the ML workflow in educational settings.
Article
Full-text available
Industry 5.0 vision, a step toward the next industrial revolution and enhancement to Industry 4.0, conceives the new goals of resilient, sustainable, and human-centric approaches in diverse emerging applications such as factories-of-the-future and digital society. The vision seeks to leverage human intelligence and creativity in nexus with intelligent, efficient, and reliable cognitive collaborating robots (cobots) to achieve zero waste, zero-defect, and mass customization-based manufacturing solutions. However , it requires merging distinctive cyber-physical worlds through intelligent orchestration of various technological enablers, e.g., cognitive cobots, human-centric artificial intelligence (AI), cyber-physical systems, digital twins, hyperconverged data storage and computing, communication infrastructure, and others. In this regard, the convergence of the emerging computational intelligence (CI) paradigm and softwarized next-generation wireless networks (NGWNs) can fulfill the stringent communication and computation requirements of the technological enablers of the Industry 5.0, which is the aim of this survey. In this article, we address this issue by reviewing and analyzing current emerging concepts and technologies, e.g., CI tools and frameworks, network-in-box architecture, open radio access networks, softwarized service architectures, potential enabling services, and others, elemental and holistic for designing the objectives of CI-NGWNs to fulfill the Industry 5.0 vision requirements. Furthermore, we outline and discuss ongoing initiatives, demos, and frameworks linked to Industry 5.0. Finally, we provide a list of lessons learned from our detailed review, research challenges, and open issues that should be addressed in CI-NGWNs to realize Industry 5.0.
Article
Die Digitalisierung führt in der Automobilindustrie zur Entstehung neuer datengetriebener Geschäftsmodelle wie dem Connected Car und stationsunabhängigen Carsharing. Bei diesen Geschäftsmodellen ist die laufende Erhebung, Integration und Analyse von Nutzerdaten im Rahmen von Data Analytics essentiell, wie sich beispielhaft am datengetriebenen Connected Car Service der Echtzeitnavigation zeigt. Die Verbreitung solcher datengetriebener Geschäftsmodelle führt nicht nur zu technologischen Fragestellungen für Informatik und Wirtschaftsinformatik, sondern stellt auch die Steuerwissenschaft vor neue Herausforderungen. In diesem Beitrag wird zunächst in relevante Grundprinzipien zur internationalen Unternehmensbesteuerung datengetriebener Geschäftsmodelle eingeführt. Darauf aufbauend wird am repräsentativen Beispiel des Connected-Car-Geschäftsmodells analysiert, zu welchen Herausforderungen die Charakteristika datengetriebener Geschäftsmodelle bei der Besteuerung eines Automobilherstellers führen. Darüber hinaus werden die steuerrechtlichen Auswirkungen konzerninterner Datentransfers beim Connected-Car-Geschäftsmodell kritisch untersucht. Damit verdeutlicht der Beitrag sowohl die Notwendigkeit einer steuerrechtlichen Analyse im Rahmen von Datenprojekten als auch spezifischen Konkretisierungs- und Reformbedarf im internationalen Steuerrecht.
Article
Full-text available
The battery industry has been growing fast because of strong demand from electric vehicle and power storage applications.Laser welding is a key process in battery manufacturing. To control the production quality, the industry has a great desire for defect inspection of automated laser welding. Recently, Convolutional Neural Networks (CNNs) have been applied with great success for detection, recognition, and classification. In this paper, using transfer learning theory and pre-training approach in Visual Geometry Group (VGG) model, we proposed the optimized VGG model to improve the efficiency of defect classification. Our model was applied on an industrial computer with images taken from a battery manufacturing production line and achieved a testing accuracy of 99.87%. The main contributions of this study are as follows: (1) Proved that the optimized VGG model, which was trained on a large image database, can be used for the defect classification of laser welding. (2) Demonstrated that the pre-trained VGG model has small model size, lower fault positive rate, shorter training time, and prediction time; so, it is more suitable for quality inspection in an industrial environment. Additionally, we visualized the convolutional layer and max-pooling layer to make it easy to view and optimize the model.
Article
Full-text available
Data governance refers to the exercise of authority and control over the management of data. The purpose of data governance is to increase the value of data and minimize data-related cost and risk. Despite data governance gaining in importance in recent years, a holistic view on data governance, which could guide both practitioners and researchers, is missing. In this review paper, we aim to close this gap and develop a conceptual framework for data governance, synthesize the literature, and provide a research agenda. We base our work on a structured literature review including 145 research papers and practitioner publications published during 2001-2019. We identify the major building blocks of data governance and decompose them along six dimensions. The paper supports future research on data governance by identifying five research areas and displaying a total of 15 research questions. Furthermore, the conceptual framework provides an overview of antecedents, scoping parameters, and governance mechanisms to assist practitioners in approaching data governance in a structured manner.
Conference Paper
Full-text available
Data Lakes haben sich in der industriellen Praxis als Plattformen für die Speicherung und Analyse aller Arten von (Roh-)daten etabliert. Erweiterte Anforderungen hinsichtlich Governance und Self-Service machen das Metadatenmanagement im Data Lake zum kritischen Erfolgsfaktor. Bisher gibt es dazu jedoch nur wenige wissenschaftliche Arbeiten, es mangelt insbesondere an einer ganzheitlichen Betrachtung zur Konzeption und Realisierung des Metadatenmanagements im Data Lake. Diese Arbeit adressiert das Thema und basiert auf praktischen Erfahrungen aus einem Industriekonzern beim Aufbau eines unternehmensweiten Data Lake. Es werden praktische Anforderungen und Anwendungsbeispiele für das Metadatenmanagement im Data Lake diskutiert und die unterschiedlichen Arten von Metadaten anhand des Praxisbeispiels analysiert. Zur Umsetzung des Metadatenmanagements werden anschießend unterschiedliche IT-Werkzeuge anhand definierter Kriterien analysiert. Das Analyseergebnis zeigt, dass Datenkataloge grundsätzlich die geeignete Werkzeugart darstellen, wobei noch technische Unzulänglichkeiten existieren. Abschließend werden die in der Praxis bestehenden Herausforderungen für ein ganzheitliches Metadatenmanagement im Data Lake zusammengefasst und zukünftige Forschungsbedarfe aufgezeigt.
Article
Full-text available
Abstract Smart manufacturing is strongly correlated with the digitization of all manufacturing activities. This increases the amount of data available to drive productivity and profit through data-driven decision making programs. The goal of this article is to assist data engineers in designing big data analysis pipelines for manufacturing process data. Thus, this paper characterizes the requirements for process data analysis pipelines and surveys existing platforms from academic literature. The results demonstrate a stronger focus on the storage and analysis phases of pipelines than on the ingestion, communication, and visualization stages. Results also show a tendency towards custom tools for ingestion and visualization, and relational data tools for storage and analysis. Tools for handling heterogeneous data are generally well-represented throughout the pipeline. Finally, batch processing tools are more widely adopted than real-time stream processing frameworks, and most pipelines opt for a common script-based data processing approach. Based on these results, recommendations are offered for each phase of the pipeline.
Article
Full-text available
Data Ecosystems are socio-technical complex networks in which actors interact and collaborate with each other to find, archive, publish, consume, or reuse data as well as to foster innovation, create value, and support new businesses. While the Data Ecosystem field is thus arguably gaining in importance, research on this subject is still in its early stages of development. Up until now, not many academic papers related to Data Ecosystems have been published. Furthermore, to the best of our knowledge, there has been no systematic review of the literature on Data Ecosystems. In this study, we provide an overview of the current literature on Data Ecosystems by conducting a systematic mapping study. This study is intended to function as a snapshot of the research in the field and by doing so identifies the different definitions of Data Ecosystem and analyzes the evolution of Data Ecosystem research. The studies selected have been classified into categories related to the study method, contribution, research topic, and ecosystem domains. Finally, we analyze how Data Ecosystems are structured and organized, and what benefits can be expected from Data Ecosystems and what their limitations are.
Article
Full-text available
The recent White House report on Artificial Intelligence (AI) (Lee, 2016) highlights the significance of AI and the necessity of a clear roadmap and strategic investment in this area. As AI emerges from science fiction to become the frontier of world-changing technologies, there is an urgent need for systematic development and implementation of AI to see its real impact in the next generation of industrial systems, namely Industry 4.0. Within the 5C architecture previously proposed in Lee et al. (2015), this paper provides an insight into the current state of AI technologies and the eco-system required to harness the power of AI in industrial applications.
Conference Paper
Full-text available
With the advances in communication technologies and the high amount of data generated, collected, and stored, it becomes crucial to manage the quality of this data deluge in an efficient and cost-effective way. The storage, processing, privacy and analytics are the main keys challenging aspects of Big Data that require quality evaluation and monitoring. Quality has been recognized by the Big Data community as an essential facet of its maturity. Yet, it is a crucial practice that should be implemented at the earlier stages of its lifecycle and progressively applied across the other key processes. The earlier we incorporate quality the full benefit we can get from insights. In this paper, we first identify the key challenges that necessitates quality evaluation. We then survey, classify and discuss the most recent work on Big Data management. Consequently, we propose an across-the-board quality management framework describing the key quality evaluation practices to be conducted through the different Big Data stages. The framework can be used to leverage the quality management and to provide a roadmap for Data scientists to better understand quality practices and highlight the importance of managing the quality. We finally, conclude the paper and point to some future research directions on quality of Big Data.
Article
Full-text available
The 21st century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights, and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics. Although it is widely debated whether big data is only hype and buzz, and data science is still in a very early phase, significant challenges and opportunities are emerging or have been inspired by the research, innovation, business, profession, and education of data science. This article provides a comprehensive survey and tutorial of the fundamental aspects of data science: the evolution from data analysis to data science, the data science concepts, a big picture of the era of data science, the major challenges and directions in data innovation, the nature of data analytics, new industrialization and service opportunities in the data economy, the profession and competency of data education, and the future of data science. This article is the first in the field to draw a comprehensive big picture, in addition to offering rich observations, lessons, and thinking about data science and analytics.
Article
Full-text available
This study reports on the findings from Part 2 of a small-scale analysis of requirements for real-world data science positions and examines three further data science roles: data analyst, data engineer and data journalist. The study examines recent job descriptions and maps their requirements to the current curriculum within the graduate MLIS and Information Science and Technology Masters Programs in the School of Information Sciences (iSchool) at the University of Pittsburgh. From this mapping exercise, model ‘course pathways’ and module ‘stepping stones’ have been identified, as well as course topic gaps and opportunities for collaboration with other Schools. Competency in four specific tools or technologies was required by all three roles (Microsoft Excel, R, Python and SQL), as well as collaborative skills (with both teams of colleagues and with clients). The ability to connect the educational curriculum with real-world positions is viewed as further validation of the translational approach being developed as a foundational principle of the current MLIS curriculum review process Â
Article
Full-text available
Today, data is generated and consumed at unprecedented scale. This has lead to novel approaches for scalable data management subsumed under the term “NoSQL” database systems to handle the ever-increasing data volume and request loads. However, the heterogeneity and diversity of the numerous existing systems impede the well-informed selection of a data store appropriate for a given application context. Therefore, this article gives a top-down overview of the field: instead of contrasting the implementation specifics of individual representatives, we propose a comparative classification model that relates functional and non-functional requirements to techniques and algorithms employed in NoSQL databases. This NoSQL Toolbox allows us to derive a simple decision tree to help practitioners and researchers filter potential system candidates based on central application requirements.
Conference Paper
Full-text available
The rhetoric of open government data (OGD) promises that data transparency will lead to multiple public benefits: economic and social innovation, civic participation, public-private collaboration, and public accountability. In reality much less has been accomplished in practice than advocates have hoped. OGD research to address this gap tends to fall into two streams – one that focuses on data publication and re-use for purposes of innovation, and one that views publication as a stimulus for civic participation and government accountability - with little attention to whether or how these two views interact. In this paper we use an ecosystem perspective to explore this question. Through an exploratory case study we show how two related cycles of influences can flow from open data publication. The first addresses transparency for innovation goals, the second addresses larger issues of data use for public engagement and greater government accountability. Together they help explain the potential and also the barriers to reaching both kinds of goals.
Conference Paper
Full-text available
Open data marketplaces have emerged as a mode of addressing open data adoption barriers. However, knowledge of how such marketplaces affect digital service innovation in open data ecosystems is limited. This paper explores their value proposition for open data users based on an exploratory case study. Five prominent perceived values are identified: lower task complexity, higher access to knowledge, increased possibilities to influence, lower risk and higher visibility. The impact on open data adoption barriers is analyzed and the consequences for ecosystem sustainability is discussed. The paper concludes that open data marketplaces can lower the threshold of using open data by providing better access to open data and associated support services, and by increasing knowledge transfer within the ecosystem.
Article
Full-text available
Data is becoming more and more of a commodity, so that it is not surprising that data has reached the status of tradable goods. An increasing number of data providers is recognizing this and is consequently setting up platforms that deserve the term “marketplace” for data. We identify several categories and dimensions of data marketplaces and data vendors and provide a survey of the current situation.
Article
Full-text available
The article reports on enhancement of data quality in data warehouse environment. Here a conceptual framework is offered for enhancing data quality in data warehouse environments. Factors are explored such as current level of data quality, the levels of quality needed by the relevant decision process, the potential benefits of projects designed to enhance data. Those who are responsible for data quality have to understand the importance of such factors. For warehouses supporting a limited number of decision processes, awareness of these issues coupled with good judgment should suffice. Data warehousing efforts may not succeed for various reasons, but nothing is more certain to yield failure than lack of concern for the quality of the data. Data supporting organizational activities in a meaningful way should be warehoused. A distinguishing characteristic of warehoused data is that it is used for decision making, rather than for operations. Data warehousing efforts have to address several potential problems.
Article
Creativity is an act of turning new and creative thoughts into fact. Creativity is characterized by means of the potential to understand the sector in new issue. Artificial Intelligence (AI) is an extensive-ranging branch of laptop science concerned with constructing clever machines able to performing duties. AI approaches can generate new ideas in three ways: by generating possible combinations of previously known ideas; by generating novel combinations of previously known ideas; and by generating unique combinations of previously known concepts. AI is solitary of those buzzwords that get about all the time. AI or specialized programming software is an area of studies that seeks to make the pc smarter. They work on their very own encryption. It is the one in ubiquitous Jargon AI utilized in marketing, buying and fashion. It could also be cast-off to come across credit score card fraud, driving. AI seems logical that day or every different computer will rule round.
Chapter
Data lakes have become popular to enable organization-wide analytics on heterogeneous data from multiple sources. Data lakes store data in their raw format and are often characterized as schema-free. Nevertheless, it turned out that data still need to be modeled, as neglecting data modeling may lead to issues concerning e.g., quality and integration. In current research literature and industry practice, Data Vault is a popular modeling technique for structured data in data lakes. It promises a flexible, extensible data model that preserves data in their raw format. However, hardly any research or assessment exist on the practical usage of Data Vault for modeling data lakes. In this paper, we assess the Data Vault model’s suitability for the data lake context, present lessons learned, and investigate success factors for the use of Data Vault. Our discussion is based on the practical usage of Data Vault in a large, global manufacturer’s data lake and the insights gained in real-world analytics projects.
Article
Advanced manufacturing is one of the core national strategies in the US (AMP), Germany (Industry 4.0) and China (Made-in China 2025). The emergence of the concept of Cyber Physical System (CPS) and big data imperatively enable manufacturing to become smarter and more competitive among nations. Many researchers have proposed new solutions with big data enabling tools for manufacturing applications in three directions: product, production and business. Big data has been a fast-changing research area with many new opportunities for applications in manufacturing. This paper presents a systematic literature review of the state-of-the-art of big data in manufacturing. Six key drivers of big data applications in manufacturing have been identified. The key drivers are system integration, data, prediction, sustainability, resource sharing and hardware. Based on the requirements of manufacturing, nine essential components of big data ecosystem are captured. They are data ingestion, storage, computing, analytics, visualization, management, workflow, infrastructure and security. Several research domains are identified that are driven by available capabilities of big data ecosystem. Five future directions of big data applications in manufacturing are presented from modelling and simulation to real-time big data analytics and cybersecurity.
Conference Paper
Data lakes have become popular to enable organization-wide analytics on heterogeneous data from multiple sources. Data lakes store data in their raw format and are often characterized as schema-free. Nevertheless, it turned out that data still need to be modeled, as neglecting data modeling may lead to issues concerning e.g., quality and integration. In current research literature and industry practice, Data Vault is a popular modeling technique for structured data in data lakes. It promises a flexible, extensible data model that preserves data in their raw format. However, hardly any research or assessment exist on the practical usage of Data Vault for modeling data lakes. In this paper, we assess the Data Vault model’s suitability for the data lake context, present lessons learned, and investigate success factors for the use of Data Vault. Our discussion is based on the practical usage of Data Vault in a large, global manufacturer’s data lake and the insights gained in real-world analytics projects.
Conference Paper
The digital transformation leads to massive amounts of heterogeneous data challenging traditional data warehouse solutions in enterprises. In order to exploit these complex data for competitive advantages, the data lake recently emerged as a concept for more flexible and powerful data analytics. However, existing literature on data lakes is rather vague and incomplete, and the various realization approaches that have been proposed neither cover all aspects of data lakes nor do they provide a comprehensive design and realization strategy. Hence, enterprises face multiple challenges when building data lakes. To address these shortcomings, we investigate existing data lake literature and discuss various design and realization aspects for data lakes, such as governance or data models. Based on these insights, we identify challenges and research gaps concerning (1) data lake architecture, (2) data lake governance, and (3) a comprehensive strategy to realize data lakes. These challenges still need to be addressed to successfully leverage the data lake in practice.
Article
The ecosystem of big data technologies and advanced analytics tools has evolved rapidly in the last years offering companies new possibilities for digital transformation and data-driven solutions. Industry 4.0 represents a major application domain for big data and advanced analytics in order to exploit the huge amounts of data generated across the industrial value chain. However, building and establishing an Industry 4.0 analytics platform involves far more than tools and technology. In this paper, we report on our practical experiences when building the Bosch Industry 4.0 Analytics Platform and discuss challenges, approaches and future research directions. The analytics platform is designed for more than 270 factories as part of Bosch’s worldwide manufacturing network. We describe use cases and requirements for the analytics platform and present its architecture. On this basis, we discuss practical challenges related to analytical solution development, employee enablement, i. e., citizen data science, as well as analytics governance and present initial solution approaches. Thereby, we highlight future research directions in order to leverage advanced analytics and big data in industrial enterprises.
Article
By moving data into a centralized, scalable storage location inside an organization – the data lake – companies and other institutions aim to discover new information and to generate value from the data. The data lake can help to overcome organizational boundaries and system complexity. However, to generate value from the data, additional techniques, tools, and processes need to be established which help to overcome data integration and other challenges around this approach. Although there is a certain agreed-on notion of the central idea, there is no accepted definition what components or functionality a data lake has or how an architecture looks like. Throughout this article, we will start with the central idea and discuss various related aspects and technologies.
Article
The advent of big data is fundamentally changing the business landscape. We open the ‘black box’ of the firm to explore how firms transform big data in order to create value and why firms differ in their abilities to create value from big data. Grounded in detailed evidence from China, the world’s largest digital market, where many firms actively engage in value creation activities from big data, we identify several novel features. We find that it is not the data itself, or individual data scientists, that generate value creation opportunities. Rather, value creation occurs through the process of data management, where managers are able to democratize, contextualize, experiment and execute data insights in a timely manner. We add richness to current theory by developing a conceptual framework of value creation from big data. We also identify avenues for future research and implications for practicing managers.
Chapter
The Internet of Things (IoT) is an information network of physical objects (sensors, machines, cars, buildings, and other items) that allows interaction and cooperation of these objects to reach common goals [2]. While the IoT affects among others transportation, healthcare, or smart homes, the Industrial Internet of Things (IIoT) refers in particular to industrial environments. In this context Cyber Manufacturing Systems (CMS) evolved as a significant term. This opening chapter gives a brief introduction of the development of IIoT introducing also the Digital Factory and cyber-physical systems. Furthermore, the challenges and requirements of IIoT and CMS are discussed as well as potentials regarding the application in Industry 4.0 are identified. In this process aspects as economic impact, architectural pattern and infrastructures are taken into account. Besides, also major research initiatives are presented. In addition to that, an orientation to the reader is given in this chapter by providing brief summaries of the chapters published in this book. Hereby, the following research areas are addressed: “Modeling for CPS and CPS”, “Architectural Design Patterns for CMS and IIoT”, “Communication and Networking”, “Artificial Intelligence and Analytics”, and “Evolution of Workforce and Human-Machine-Interaction”. The chapter closes with a discussion about future trends of IIoT and CMS within Industry 4.0.
Conference Paper
Detecting and repairing dirty data is one of the perennial challenges in data analytics, and failure to do so can result in inaccurate analytics and unreliable decisions. Over the past few years, there has been a surge of interest from both industry and academia on data cleaning problems including new abstractions, interfaces, approaches for scalability, and statistical techniques. To better understand the new advances in the field, we will first present a taxonomy of the data cleaning literature in which we highlight the recent interest in techniques that use constraints, rules, or patterns to detect errors, which we call qualitative data cleaning. We will describe the state-of-the-art techniques and also highlight their limitations with a series of illustrative examples. While traditionally such approaches are distinct from quantitative approaches such as outlier detection, we also discuss recent work that casts such approaches into a statistical estimation framework including: using Machine Learning to improve the efficiency and accuracy of data cleaning and considering the effects of data cleaning on statistical analysis.
Article
We illustrate the usefulness of an Ontology-Based Data Management (OBDM) approach to develop an open information system, allowing for a deep level of interoperability among different databases, and accounting for additional dimensions of data quality compared to the standard dimensions of the OECD (Quality framework and guidelines for OECD statistical activities, OECD Publishing, Paris, 2011) Quality Framework. Recent advances in engineering in computer science provide promising tools to solve some of the crucial issues in data integration for Research and Innovation.
Book
The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast. Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse. Demystifies data vault modeling with beginning, intermediate, and advanced techniques. Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0.
Book
This book presents and discusses the main strategic and organizational challenges posed by Big Data and analytics in a manner relevant to both practitioners and scholars. The first part of the book analyzes strategic issues relating to the growing relevance of Big Data and analytics for competitive advantage, which is also attributable to empowerment of activities such as consumer profiling, market segmentation, and development of new products or services. Detailed consideration is also given to the strategic impact of Big Data and analytics on innovation in domains such as government and education and to Big Data-driven business models. The second part of the book addresses the impact of Big Data and analytics on management and organizations, focusing on challenges for governance, evaluation, and change management, while the concluding part reviews real examples of Big Data and analytics innovation at the global level. The text is supported by informative illustrations and case studies, so that practitioners can use the book as a toolbox to improve understanding and exploit business opportunities related to Big Data and analytics.
Article
Throughout the history of the artificial intelligence movement, researchers have strived to create computers that could simulate general human intelligence. This paper argues that workers in artificial intelligence have failed to achieve this goal because they adopted the wrong model of human behavior and intelligence, namely a cognitive essentialist model with origins in the traditional philosophies of natural intelligence. An analysis of the word "intelligence" suggests that it originally referred to behavior-environment relations and not to inferred internal structures and processes. It is concluded that if workers in artificial intelligence are to succeed in their general goal, then they must design machines that are adaptive, that is, that can learn. Thus, artificial intelligence researchers must discard their essentialist model of natural intelligence and adopt a selectionist model instead. Such a strategic change should lead them to the science of behavior analysis.
Article
Contenido: 1. Organizaciones, administración y la empresa en red; 2. Infraestructura de tecnología de información; 3. Aplicaciones de sistemas importantes para la era digital; 4. Construcción y administración de sistemas.
The DGI data governance framework. The Data Governance Institute
DGI (2020): The DGI data governance framework. The Data Governance Institute.
Universal artificial intelligence. Practical agents and fundamental challenges. Foundations of Trusted Autonomy
  • T Everitt
  • M Hutter
  • Everitt T.
Everitt, T. and Hutter, M. (2018): Universal artificial intelligence. Practical agents and fundamental challenges. Foundations of trusted autonomy. H. Abbass, J. Scholz, and D. Reid, eds. Springer. 15-46.
Building an industry 4.0 analytics platform. Datenbank-Spektrum
  • C Gröger
Gröger, C. (2018): Building an industry 4.0 analytics platform. Datenbank-Spektrum. 18, 1, 5-14.
DAMA-DMBOK: Data management body of knowledge
  • D Henderson
  • S Earley
  • L Sebastian-Coleman
  • E Sykora
  • E Smith
Henderson, D., Earley, S., Sebastian-Coleman, L., Sykora, E. and Smith, E. (2017): DAMA-DMBOK: Data management body of knowledge. Technics Publications, New Jersey.
Holistic data governance: a framework for competitive advantage
  • Informatica
Informatica (2017): Holistic data governance: a framework for competitive advantage.
Enterprise-control system integration -Part 2: Objects and attributes for enterprise-control system integration
International Organization for Standardization (2015): IEC 62264-2:2015. Enterprise-control system integration -Part 2: Objects and attributes for enterprise-control system integration.
Architecting Data Lakes
  • B Sharma
  • Sharma B.
Sharma, B. (2018): Architecting data lakes. O'Reilly, Sebastopol.
Plotkin D. Data Stewardship
  • D Data Plotkin
  • Stewardship
Sharma, B. Architecting Data Lakes
  • B Sharma
  • Architecting Data Lakes
  • O'reilly
  • Sharma B.
The SAS data governance framework: A blueprint for success
  • The
SAS (2018): The SAS data governance framework: a blueprint for success. SAS Institute.
Information governance principles and practices for a big data landscape
  • C Ballard
  • C Compert
  • T Jesionowski
  • I Milman
  • B Plants
  • B Rosen
  • H Smith
  • Ballard C.
Artificial intelligence for the real world
  • T H Davenport
  • R Ronanki
  • Davenport T.H.
State of AI in the enterprise
  • J Loucks
  • T H Davenport
  • D Schatsky
  • Loucks J.