About
13
Publications
1,493
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
59
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (13)
Every day, organizations following a collect-everything mentality generate, process, and store an ever-increasing amount of data. With the increasing amount of data available, it is becoming more difficult and complex to analyze it in a way that creates beneficial knowledge. This analysis is usually done via queries, i. e. the queries themselves al...
Data science must respect privacy in many situations. We have built a query repository with automatic SQL query classification according to data-privacy directives. It can intercept queries that violate the directives, since a JDBC proxy driver inserted between the end-users’ SQL tooling and the target data consults the repository for the complianc...
Digitization within the framework of Industry 4.0 is considered the biggest and fastest driver of change in history of manufacturing industry. While the size of a company is becoming less essential, the ability to adapt quickly to changing market conditions and new technologies is more important than ever. This trend particularly applies to the com...
Complex data analysis scenarios often require discovering and combining multiple data sources. Data scientists usually formulate a series of SQL queries building on each other, also called a session, to iteratively derive results. However, due to a lack of familiarity with data sources or the complexity of query results, it can be a hard task to de...
SQL queries encapsulate the knowledge of their authors about the usage of the queried data sources. This knowledge also contains aspects that cannot be inferred by analyzing the contents of the queried data sources alone. Due to the complexity of analytical SQL queries, specialized mechanisms are necessary to enable the user-friendly formulation of...
Analytical SQL queries are a valuable source of information. Query log analysis can provide insight into the usage of datasets and uncover knowledge that cannot be inferred from source schemas or content alone. To unlock this potential, flexible mechanisms for meta-querying are required. Syntactic and semantic aspects of queries must be considered...
Writing effective analytical queries requires data scientists to have in-depth knowledge of the existence, semantics, and usage context of data sources. Once gathered, such knowledge is informally shared within a specific team of data scientists, but usually is neither formalized nor shared with other teams. Potential synergies remain unused. We in...
We introduce Query-driven Knowledge-Sharing Systems (QKSS), which extend data management systems with knowledge-sharing capabilities to facilitate collaboration among different teams of data scientists. Relevant tacit knowledge about data sources is extracted from SQL query logs and externalized to support data source discovery and data integration...
In larger organizations, multiple teams of data scientists have to integrate data from heterogeneous data sources as preparation for data analysis tasks. Writing effective analytical queries requires data scientists to have in-depth knowledge of the existence, semantics, and usage context of data sources. Once gathered, such knowledge is informally...
In the medical domain, data quality is very important. Since requirements and data change frequently, continuous and sustainable monitoring and improvement of data quality is necessary. Working together with managers of medical centers, we developed an architecture for a data quality monitoring system. The architecture enables domain experts to ada...
The α-Flow project enables process support in heterogeneous and inter-institutional scenarios in healthcare. α-Flow provides a distributed case file and represents workflow schemas as documents which are shared coequally to content documents. The activity progress and data flow is controlled by process-related metadata. A use case will motivate use...