
Lisa EhrlingerSoftware Competence Center Hagenberg | SCCH
Lisa Ehrlinger
Dr. techn.
About
39
Publications
40,525
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
834
Citations
Introduction
Additional affiliations
September 2014 - July 2022
Publications
Publications (39)
Most existing methodologies agree that the assessment of data quality (DQ) is a cyclic process, which has to be carried out continuously. Nevertheless, the majority of DQ tools allow the evaluation of data sources only at specific points in time, and the automation and scheduling is therefore in the responsibility of the user. In contrast, automate...
Recently, the term knowledge graph has been used frequently in research and business, usually in close association with Semantic Web technologies, linked data, large-scale data analytics and cloud computing. Its popularity is clearly influenced by the introduction of Google's Knowledge Graph in 2012, and since then the term has been widely used wit...
Data is central to decision-making in enterprises and organizations (e.g., smart factories and predictive maintenance), as well as in private life (e.g., booking platforms). Especially in artificial intelligence applications, like self-driving cars, trust in data-driven decisions depends directly on the quality of the underlying data. Therefore, it...
Training machine learning models, especially in producing enterprises with numerous information systems having different data structures, requires efficient data access. Hence, standardized descriptions of data sources and their data structures are a fundamental requirement. We therefore introduce version 4.0 of the Data Source Description Vocabula...
Data catalogs automatically collect metadata from distributed data sources and provide a unified and easily accessible view on the data. Many existing data catalog tools focus on the automatic collection of technical metadata (e.g., from a data dictionary) into a central repository. The functionality of annotating data with semantics (i.e., its mea...
Data quality is of central importance for the qualitative evaluation of decisions taken by AI-based applications. In practice, data from several heterogeneous data sources is integrated, but complete, global domain knowledge is often not available. In such heterogeneous scenarios, it is particularly difficult to monitor data quality (e.g., complete...
Technical standards help software architects to identify relevant requirements and to facilitate system certification, i.e., to systematically assess whether a system meets critical requirements in fields like security, safety, or interoperability. Despite their usefulness, standards typically remain vague on how requirements should be addressed vi...
In the last two decades, computing and storage
technologies have experienced enormous advances. Leveraging
these recent advances, Artificial Intelligence (AI) is making
the leap from traditional classification use cases to automation
of complex systems through advanced machine learning and
reasoning algorithms. While the literature on AI algorithms...
In the last two decades, computing and storage technologies have experienced enormous advances. Leveraging these recent advances, Artificial Intelligence (AI) is making the leap from traditional classification use cases to automation of complex systems through advanced machine learning and reasoning algorithms. While the literature on AI algorithms...
High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for subsequent tasks like data integration or data analytics. However, from a scientific perspective, a lot of res...
Temporal knowledge graphs allow to store process data in a natural way since they also model the time aspect. An example for such data are registration processes in the area of intellectual property protection. A common question in such settings is to predict the future behavior of a (yet unfinished) process. However, traditional process mining tec...
Data integration, data management, and data quality assurance are essential tasks in any data science project. However, these tasks are often not treated with the same priority as core data analytics tasks, such as the training of statistical models. One reason is that data analytics generate directly reportable results and data management is only...
Data management approaches have changed drastically in the past few years due to improved data availability and increasing interest in data analysis (e.g., artificial intelligence). The volume, velocity, and variety of data requires novel and automated ways to "operate" this data. In accordance with software development, where DevOps is the de-fact...
In enterprises, data is usually distributed across multiple data sources and stored in heterogeneous formats. The harmonization and integration of data is a prerequisite to leverage it for AI initiatives. Recently, data catalogs pose a promising solution to semantically classify and organize data sources across different environments and to enrich...
Newsadoo is a media startup that provides news articles from different sources on a single platform. Users can create individual time-lines, where they follow the latest development of a specific topic. To support the topic creation process, we developed an algorithm that automatically suggests related tags to a set of given reference tags. In this...
Data quality assessment is a challenging but necessary task to ensure that business decisions that are derived from data can be trusted. A number of data quality metrics have been developed to measure dimensions like accuracy, completeness, and timeliness. The tool QuaIIe (developed as part of our previous research) facilitates the calculation of d...
Knowledge graphs in manufacturing and production aim to make production lines more efficient and flexible with higher quality output. This makes knowledge graphs attractive for companies to reach Industry 4.0 goals. However, existing research in the field is quite preliminary, and more research effort on analyzing how knowledge graphs can be applie...
In order to make good decisions, the data used for decision-making needs to be of high quality. As the volume of data continually increases, ensuring high data quality is a big challenge nowadays and needs to be automated with tools. The goal of the Data Quality Library (DaQL) is to provide a tool to continuously ensure and measure data quality as...
High data quality (e.g., completeness, accuracy, non-redundancy) is essential to ensure the trustworthiness of AI applications. In such applications, huge amounts of data is integrated from different heterogeneous sources and complete, global domain knowledge is often not available. This scenario has a number of negative effects, in particular, it...
The main challenges are discussed together with the lessons learned from past and ongoing research along the development cycle of machine learning systems. This will be done by taking into account intrinsic conditions of nowadays deep learning models, data and software quality issues and human-centered artificial intelligence (AI) postulates, inclu...
Knowledge graphs in manufacturing and production aim to make production lines more efficient and flexible with higher quality output. This makes knowledge graphs attractive for companies to reach Industry 4.0 goals. However, existing research in the field is quite preliminary, and more research effort on analyzing how knowledge graphs can be applie...
Industrial production processes generate huge amounts of streaming data, usually collected by the deployed machines. To allow the analysis of this data (e.g., for process stability monitoring or predictive maintenance), it is necessary that the data streams are of high quality and comparable between machines. A common problem in such scenarios is s...
The main challenges along with lessons learned from ongoing research in the application of machine learning systems in practice are discussed, taking into account aspects of theoretical foundations, systems engineering, and human-centered AI postulates. The analysis outlines a fundamental theory-practice gap which superimposes the challenges of AI...
Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time mea...
High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for subsequent tasks like data integration or data analytics. However, from a scientific perspective, a lot of res...
The reliability and trustworthiness of machine learning models depends directly on the data used to train them. Knowledge about data defects that affect machine learning models is most often considered implicitly by data analysts, but usually no centralized data defect management exists. Knowledge graphs are a powerful tool to capture, structure, e...
Data quality measurement is a critical success factor to estimate the explanatory power of data-driven decisions. Several data quality dimensions, such as completeness, accuracy, and timeliness, have been investigated so far and metrics for their measurement have been proposed. While most research into those dimensions refers to the data values, sc...
The development of well-founded metrics to measure data quality is essential to estimate the significance of data-driven decisions, which are, besides others, the basis for artificial intelligence applications. While the majority of research into data quality refers to the data values of an information system, less research is concerned with schema...
Assessing the quality of information system schemas is crucial, because an unoptimized or erroneous schema design has a strong impact on the quality of the stored data, e.g., it may lead to inconsistencies and anomalies at the data-level. Even if the initial schema had an ideal design, changes during the life cycle can negatively affect the schema...
Data quality measurement is essential to gain knowledge about data used for decision-making and to evaluate the trustworthiness of those decisions. Example applications, which are based on automated decision-making, are self-driving cars, smart factories, and weather forecast. One-time data quality measurement is an important starting point for any...
With the advent of Industry 4.0, many companies aim at analyzing historically collected or operative transaction data. Despite the availability of large amounts of data, particular missing values can introduce bias or preclude the use of specific data analytics methods. Historically, a lot of research into missing data comes from the social science...