Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS
Recent publications
Artificial intelligence is increasingly penetrating industrial applications as well as areas that affect our daily lives. As a consequence, there is a need for criteria to validate whether the quality of AI applications is sufficient for their intended use. Both in the academic community and societal debate, an agreement has emerged under the term “trustworthiness” as the set of essential quality requirements that should be placed on an AI application. At the same time, the question of how these quality requirements can be operationalized is to a large extent still open. In this paper, we consider trustworthy AI from two perspectives: the product and organizational perspective. For the former, we present an AI-specific risk analysis and outline how verifiable arguments for the trustworthiness of an AI application can be developed. For the second perspective, we explore how an AI management system can be employed to assure the trustworthiness of an organization with respect to its handling of AI. Finally, we argue that in order to achieve AI trustworthiness, coordinated measures from both product and organizational perspectives are required.
Machine learning applications have become ubiquitous. Their applications range from embedded control in production machines over process optimization in diverse areas (e.g., traffic, finance, sciences) to direct user interactions like advertising and recommendations. This has led to an increased effort of making machine learning trustworthy. Explainable and fair AI have already matured. They address the knowledgeable user and the application engineer. However, there are users that want to deploy a learned model in a similar way as their washing machine. These stakeholders do not want to spend time in understanding the model, but want to rely on guaranteed properties. What are the relevant properties? How can they be expressed to the stake- holder without presupposing machine learning knowledge? How can they be guaranteed for a certain implementation of a machine learning model? These questions move far beyond the current state of the art and we want to address them here. We propose a unified framework that certifies learning methods via care labels. They are easy to understand and draw inspiration from well-known certificates like textile labels or property cards of electronic devices. Our framework considers both, the machine learning theory and a given implementation. We test the implementation's compliance with theoretical properties and bounds.
Zusammenfassung Hintergrund und Ziele Schon in der frühen Phase der global sehr verschieden verlaufenden COVID-19-Pandemie zeigten sich Hinweise auf den Einfluss sozioökonomischer Faktoren auf die Ausbreitungsdynamik der Erkrankung, die vor allem ab der zweiten Phase (September 2020) Menschen mit geringerem sozioökonomischen Status stärker betraf. Solche Effekte können sich auch innerhalb einer Großstadt zeigen. Die vorliegende Studie visualisiert und untersucht die zeitlich-räumliche Verbreitung aller in Köln gemeldeten COVID-19-Fälle (Februar 2020–Oktober 2021) auf Stadtteilebene und deren mögliche Assoziation mit sozioökonomischen Faktoren. Methoden Pseudonymisierte Daten aller in Köln gemeldeten COVID-19-Fälle wurden geocodiert, deren Verteilung altersstandardisiert auf Stadtteilebene über 4 Zeiträume kartiert und mit der Verteilung von sozialen Faktoren verglichen. Der mögliche Einfluss der ausgewählten Faktoren wird zudem in einer Regressionsanalyse in einem Modell mit Fallzuwachsraten betrachtet. Ergebnisse Das kleinräumige lokale Infektionsgeschehen ändert sich im Pandemieverlauf. Stadtteile mit schwächeren sozioökonomischen Indizes weisen über einen großen Teil des pandemischen Verlaufs höhere Inzidenzzahlen auf, wobei eine positive Korrelation zwischen den Armutsrisikofaktoren und der altersstandardisierten Inzidenz besteht. Die Stärke dieser Korrelation ändert sich im zeitlichen Verlauf. Schlussfolgerung Die zeitnahe Beobachtung und Analyse der lokalen Ausbreitungsdynamik lassen auch auf der Ebene einer Großstadt die positive Korrelation von nachteiligen sozioökonomischen Faktoren auf die Inzidenzrate von COVID-19 erkennen und können dazu beitragen, lokale Eindämmungsmaßnahmen zielgerecht zu steuern.
Machine learning and artificial intelligence have become crucial factors for the competitiveness of individual companies and entire economies. Yet their successful deployment requires access to a large volume of training data often not even available to the largest corporations. The rise of trustworthy federated digital ecosystems will significantly improve data availability for all participants and thus will allow a quantum leap for the widespread adoption of artificial intelligence at all scales of companies and in all sectors of the economy. In this chapter, we will explain how AI systems are built with data science and machine learning principles and describe how this leads to AI platforms. We will detail the principles of distributed learning which represents a perfect match with the principles of distributed data ecosystems and discuss how trust, as a central value proposition of modern ecosystems, carries over to creating trustworthy AI systems.
Background Unmanned aerial vehicle (UAV)–based image retrieval in modern agriculture enables gathering large amounts of spatially referenced crop image data. In large-scale experiments, however, UAV images suffer from containing a multitudinous amount of crops in a complex canopy architecture. Especially for the observation of temporal effects, this complicates the recognition of individual plants over several images and the extraction of relevant information tremendously. Results In this work, we present a hands-on workflow for the automatized temporal and spatial identification and individualization of crop images from UAVs abbreviated as “cataloging” based on comprehensible computer vision methods. We evaluate the workflow on 2 real-world datasets. One dataset is recorded for observation of Cercospora leaf spot—a fungal disease—in sugar beet over an entire growing cycle. The other one deals with harvest prediction of cauliflower plants. The plant catalog is utilized for the extraction of single plant images seen over multiple time points. This gathers a large-scale spatiotemporal image dataset that in turn can be applied to train further machine learning models including various data layers. Conclusion The presented approach improves analysis and interpretation of UAV data in agriculture significantly. By validation with some reference data, our method shows an accuracy that is similar to more complex deep learning–based recognition techniques. Our workflow is able to automatize plant cataloging and training image extraction, especially for large datasets.
Shifting from effect- towards cause-oriented and systemic approaches in sustainable climate change adaptation requires a solid understanding of the climate-related and societal causes behind climate risks. Thus, capturing, systemizing, and prioritizing factors contributing to climate risks are essential for developing cause-oriented climate risk and vulnerability assessments (CRVA). Impact Chains (IC) are conceptual models used to capture hazard, vulnerability and exposure factors that lead to a specific risk. IC modeling includes a participatory stakeholder phase and an operational quantification phase. While ICs are widely implemented to systematically capture risk processes, they still show methodological gaps concerning, e.g., the integration of dynamic feedback or balanced stakeholder involvement. Such gaps usually only become apparent in practical applications, and there is currently no systematic perspective on common challenges and methodological needs. Therefore, we reviewed 47 articles applying IC and similar CRVA methods that consider the cause-effect dynamics governing risk. We provide an overview of common challenges and opportunities as a roadmap for future improvements. We conclude that IC should move from a linear-, to an impact web -like representation of risk to integrate cause-effect dynamics. Qualitative approaches are based on significant stakeholder involvement to capture expert-, place-, and context-specific knowledge. The integration of IC into quantifiable, executable models is still highly underexplored due to a limited understanding of systems, data, evaluation options, and other uncertainties. Ultimately, using IC to capture the underlying, complex processes behind risk supports effective, long-term, and sustainable climate change adaptation.
With the objective to enhance human performance and maximize engagement during the performance of tasks, we aim to advance automation for decision making in complex and large-scale multi-agent settings. Towards these goals, this paper presents a deep multi agent reinforcement learning method for resolving demand - capacity imbalances in real-world Air Traffic Management settings with thousands of agents. Agents comprising the system are able to jointly decide on the measures to be applied to resolve imbalances, while they provide explanations on their decisions: This information is rendered and explored via appropriate visual analytics tools. The paper presents how major challenges of scalability and complexity are addressed, and provides results from evaluation tests that show the abilities of models to provide high-quality solutions and high-fidelity explanations.
Deployment of modern data-driven machine learning methods, most often realized by deep neural networks (DNNs), in safety-critical applications such as health care, industrial plant control, or autonomous driving is highly challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability and implausible predictions to directed attacks by means of malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from so-called safety concerns, properties that preclude their deployment as no argument or experimental setup can help to assess the remaining risk. In recent years, an abundance of state-of-the-art techniques aiming to address these safety concerns has emerged. This chapter provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our work addresses machine learning experts and safety engineers alike: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern machine learning methods. We hope that this contribution fuels discussions on desiderata for machine learning systems and strategies on how to help to advance existing approaches accordingly.
Currently, the methodological and technical developments in visual analytics, as well as the existing theories, are not sufficiently grounded by empirical studies that can provide an understanding of the processes of visual data analysis, analytical reasoning and derivation of new knowledge by humans. We conducted an exploratory empirical study in which participants analysed complex and data‐rich visualisations by detecting salient visual patterns, translating them into conceptual information structures and reasoning about those structures to construct an overall understanding of the analysis subject. Eye tracking and voice recording were used to capture this process. We analysed how the data we had collected match several existing theoretical models intended to describe visualisation‐supported reasoning, knowledge building, decision making or use and development of mental models. We found that none of these theoretical models alone is sufficient for describing the processes of visual analysis and knowledge generation that we observed in our experiments, whereas a combination of three particular models could be apposite. We also pondered whether empirical studies like ours can be used to derive implications and recommendations for possible ways to support users of visual analytics systems. Our approaches to designing and conducting the experiments and analysing the empirical data were appropriate to the goals of the study and can be recommended for use in other empirical studies in visual analytics. Participants of an exploratory empirical study analysed complex visualisations by detecting salient visual patterns and reasoning about them to construct an overall understanding of the analysis subject. We matched the observed analysis processes to several theoretical models describing visualisation‐supported reasoning, knowledge building, decision making, or use and development of mental models.
We consider the general problem known as job shop scheduling, in which multiple jobs consist of sequential operations that need to be executed or served by appropriate machines having limited capacities. For example, train journeys (jobs) consist of moves and stops (operations) to be served by rail tracks and stations (machines). A schedule is an assignment of the job operations to machines and times where and when they will be executed. Developers of computational methods for job scheduling need tools enabling them to explore how their methods work. At a high level of generality, we define the system of pertinent exploration tasks and a combination of visualisations capable of supporting the tasks. We provide general descriptions of the purposes, contents, visual encoding, properties, and interactive facilities of the visualisations and illustrate them with images from an example implementation in air traffic management. We justify the design of the visualisations based on the tasks, principles of creating visualisations for pattern discovery, and scalability requirements. The outcomes of our research are sufficiently general to be of use in a variety of applications.
Senior researcher Vanessa Lage-Rupprecht and two collaborators talk about what data science means to them and illustrate how they managed to create a data and lab coexistence in their drug-repurposing project, which was recently published in Patterns. In this article, they have developed a drug-target-mechanism-oriented data model, Human Brain PHARMACOME, and have presented it as a resource to the community.
Ontologies – providing an explicit schema for underlying data – often serve as background knowledge for machine learning approaches. Similar to ILP methods, concept learning utilizes such ontologies to learn concept expressions from examples in a supervised manner. This learning process is usually cast as a search process through the space of ontologically valid concept expressions, guided by heuristics. Such heuristics usually try to balance explorative and exploitative behaviors of the learning algorithms. While exploration ensures a good coverage of the search space, exploitation focuses on those parts of the search space likely to contain accurate concept expressions. However, at their extreme ends, both paradigms are impractical: A totally random explorative approach will only find good solutions by chance, whereas a greedy but myopic, exploitative attempt might easily get trapped in local optima. To combine the advantages of both paradigms, different meta-heuristics have been proposed. In this paper, we examine the Simulated Annealing meta-heuristic and how it can be used to balance the exploration-exploitation trade-off in concept learning. In different experimental settings, we analyse how and where existing concept learning algorithms can benefit from the Simulated Annealing meta-heuristic.
We propose an approach to underpin interactive visual exploration of large data volumes by training Learned Visualization Index (LVI). Knowing in advance the data, the aggregation functions that are used for visualization, the visual encoding, and available interactive operations for data selection, LVI allows to avoid time-consuming data retrieval and processing of raw data in response to user’s interactions. Instead, LVI directly predicts aggregates of interest for the user’s data selection. We demonstrate the efficiency of the proposed approach in application to two use cases of spatio-temporal data at different scales.
Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.
The high number of failed pre-clinical and clinical studies for compounds targeting Alzheimer disease (AD) has demonstrated that there is a need to reassess existing strategies. Here, we pursue a holistic, mechanism-centric drug repurposing approach combining computational analytics and experimental screening data. Based on this integrative workflow, we identified 77 druggable modifiers of tau phosphorylation (pTau). One of the upstream modulators of pTau, HDAC6, was screened with 5,632 drugs in a tau-specific assay, resulting in the identification of 20 repurposing candidates. Four compounds and their known targets were found to have a link to AD-specific genes. Our approach can be applied to a variety of AD-associated pathophysiological mechanisms to identify more repurposing candidates.
Geospatial knowledge has always been an essential driver for many societal aspects. This concerns in particular urban planning and urban growth management. To gain insights from geospatial data and guide decisions usually authoritative and open data sources are used, combined with user or citizen sensing data. However, we see a great potential for improving geospatial analytics by combining geospatial data with the rich terminological knowledge, e.g., provided by the Linked Open Data Cloud. Having semantically explicit, integrated geospatial and terminological knowledge, expressed by means of established vocabularies and ontologies, cross-domain spatial analytics can be performed. One analytics technique working on terminological knowledge is inductive concept learning, an approach that learns classifiers expressed as logical concept descriptions. In this paper, we extend inductive concept learning to infer and make use of the spatial context of entities in spatio-terminological data. We propose a formalism for extracting and making spatial relations explicit such that they can be exploited to learn spatial concept descriptions, enabling ‘spatially aware’ concept learning. We further provide an implementation of this formalism and demonstrate its capabilities in different evaluation scenarios.
SPARQL query generation from natural language questions is complex because it requires an understanding of both the question and underlying knowledge graph (KG) patterns. Most SPARQL query generation approaches are template-based, tailored to a specific knowledge graph and require pipelines with multiple steps, including entity and relation linking. Template-based approaches are also difficult to adapt for new KGs and require manual efforts from domain experts to construct query templates. To overcome this hurdle, we propose a new approach, dubbed SGPT, that combines the benefits of end-to-end and modular systems and leverages recent advances in large-scale language models. Specifically, we devise a novel embedding technique that can encode linguistic features from the question which enables the system to learn complex question patterns. In addition, we propose training techniques that allow the system to implicitly employ the graph-specific information (i.e., entities and relations) into the language model’s parameters and generate SPARQL queries accurately. Finally, we introduce a strategy to adapt standard automatic metrics for evaluating SPARQL query generation. A comprehensive evaluation demonstrates the effectiveness of SGPT over state-of-the-art methods across several benchmark datasets.
In times of climate change, growing world population, and the resulting scarcity of resources, efficient and economical usage of agricultural land is increasingly important and challenging at the same time. To avoid disadvantages of monocropping for soil and environment, it is advisable to practice intercropping of various plant species whenever possible. However, intercropping is challenging as it requires a balanced planting schedule due to individual cultivation time frames. Maintaining a continuous harvest throughout the season is important as it reduces logistical costs and related greenhouse gas emissions, and can also help to reduce food waste. Motivated by the prevention of food waste, this work proposes a flexible optimization method for a full harvest season of large crop ensembles that complies with given economical and environmental constraints. Our approach applies evolutionary algorithms and we further combine our evolution strategy with a sophisticated hierarchical loss function and adaptive mutation rate. We thus transfer the multi-objective into a pseudo-single-objective optimization problem, for which we obtain faster and better solutions than those of conventional approaches.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
119 members
Gennady Andrienko
  • Knowledge Discovery (KD)
Stefan Rüping
  • Business Unit Big Data Analytics
Georg Fuchs
  • Business Unit Big Data Analytics
Daniel Stein
  • Fraunhofer-Institute for Intelligent Analysis and Information Systems IAIS
Christoph Schmidt
  • NetMedia (NM)
Konrad-Adenauer-Straße, Schloss Birlinghoven, 53757 Sankt Augustin, Sankt Augustin, Germany
Head of institution
Prof. Dr. Stefan Wrobel
+49 (0) 2241 14-3000
+49 (0) 2241 14-4-3000