About
61
Publications
11,079
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
752
Citations
Introduction
Current institution
Additional affiliations
October 2021 - present
January 2020 - November 2021
Education
October 2016 - October 2019
September 2014 - September 2016
September 2011 - July 2014
Publications
Publications (61)
Named entity recognition is an important task when constructing knowledge bases from unstructured data sources. Whereas entity detection methods mostly rely on extensive training data, Large Language Models (LLMs) have paved the way towards approaches that rely on zero-shot learning (ZSL) or few-shot learning (FSL) by taking advantage of the capabi...
Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like micro-blogging posts and news has proven challenging as they struggle to model open-domain entities and relatio...
Several techniques and workflows have emerged recently for automatically extracting knowledge graphs from documents like scientific articles and patents. However, adapting these approaches to integrate alternative text sources such as micro-blogging posts and news and to model open-domain entities and relationships commonly found in these sources i...
The labor market is a dynamic and rapidly evolving environment. Job positions that require expertise in various sectors often lead candidates to question their suitability. Therefore, it is crucial to furnish them with relevant, accurate, and timely information. In this article, we introduce a knowledge plug-in for existing conversational agents de...
Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like micro-blogging posts and news has proven challenging as they struggle to model open-domain entities and relatio...
This paper explores the growing importance of Environmental, Social, and Governance (ESG) criteria in financial assessments and conducts an AI-driven analysis of ESG concepts' evolution from 1980 to 2022. Focusing on media sources from the United States and the United Kingdom, the study utilizes the Dow Jones News Article dataset for a comprehensiv...
In recent years, the significance of Environmental, Social, and Governance criteria in assessing financial investments has grown significantly. This paper presents an AI-driven analysis of ESG concepts and their evolution from 1980 to 2022, with a specific focus on media sources from the United States and the United Kingdom. The primary data source...
In today’s rapidly evolving labor market, the emergence of new roles and the decline of traditional ones have led to a complex landscape of job titles and skill requirements. This complexity often causes ambiguity and confusion, affecting both novices and experienced professionals. To address this, extensive international efforts have produced refe...
The labor market is a key part of an economy. Several existing online platforms allow the upload of resumes and the search for a job. One of their limitations, however, is that obtaining the best opportunity can be hard because certain jobs need some experiences, abilities, and features that an applicant might not know. The recent diffusion and emp...
Research data is on its way to be recognized as a first-class citizen in research; however, and despite its importance for science, software still has a long way to go. Recent initiatives are paving the way, including FAIR for Research Software and Software Management Plans. A step further towards machine-actionability is adding a structured metada...
Textual documents are the means of sharing information and preserving knowledge for a large variety of domains. The patent domain is also using such a paradigm which is becoming difficult to maintain and is limiting the potentialities of using advanced AI systems for domain analysis. To overcome this issue, it is more and more frequent to find appr...
Searching and exploring online information is fundamental for our society. However, it is common to find inaccurate information on the Internet, that can quickly spread and be hard to identify. Fortunately, today, many fact-checking sources verify online information to provide online users with a means to recognize its truthfulness. These sources u...
Natural Language Processing (NLP) is crucial to perform recommendations of items that can be only described by natural language. However, NLP usage within recommendation modules is difficult and usually requires a relevant initial effort, thus limiting its widespread adoption. To overcome this limitation, we introduce FORESEE, a novel architecture...
The ever-increasing amount of research output through scientific articles requires means to enable transparency and a better understanding of key entities of the research lifecycle, referred to as research artifacts, such as methods, software, datasets, etc. Research Knowledge Graphs (RKG) make research artifacts findable, accessible, interoperable...
In the last few years, we have witnessed the emergence of several knowledge graphs that explicitly describe research knowledge with the aim of enabling intelligent systems for supporting and accelerating the scientific process. These resources typically characterize a set of entities in this space (e.g., tasks, methods, evaluation techniques, prote...
Science communication has a number of bottlenecks that include the rising number of published research papers and its non-machine-accessible and document-based paradigm, which makes the exploration, reading, and reuse of research outcomes rather inefficient. Recently, Knowledge Graphs (KG), i.e., semantic interlinked networks of entities, have been...
In recent years, we saw the emergence of several approaches for producing machine-readable, semantically rich, interlinked descriptions of the content of research publications, typically encoded as knowledge graphs. A common limitation of these solutions is that they address a low number of articles, either because they rely on human experts to sum...
Lexicons have risen as alternative resources to common supervised methods for classification or regression in different domains (e.g., Sentiment Analysis). These resources (especially lexical) lack of important domain context and it is not possible to tune/edit/improve them depending on new domains and data. With the exponential production of data...
Frequently, Text Classification is limited by insufficient training data. This problem is addressed by Zero-Shot Classification through the inclusion of external class definitions and then exploiting the relations between classes seen during training and unseen classes (Zero-shot). However, it requires a class embedding space capable of accurately...
Cultural heritage portals have the goal of providing users with seamless access to all their resources. This paper introduces initial efforts for a user-oriented restructuring of the German Digital Library (DDB). At present, cultural heritage objects (CHOs) in the DDB are modeled using an extended version of the Europeana Data Model (DDB-EDM), whic...
Under the German government's initiative "NEUSTART Kultur", the German Digital Library or Deutsche Digitale Bibliothek (DDB) is undergoing improvements to enhance user-experience. As an initial step, emphasis is placed on creating a knowledge graph from the bibliographic record collection of the DDB. This paper discusses the challenges facing the D...
Archival records are essential sources of information for historians and digital humanists to understand history. For modern information systems they are often analysed and integrated into Knowledge Graphs for better access, interoperability and re-use. However, due to restrictions of the representation of RDF predicates temporal data within archiv...
An increasing number of archival institutions aim to provide public access to historical documents. Ontologies have been designed, developed and utilised to model the archival description of historical documents and to enable interoperability between different information sources. However, due to the heterogeneous nature of archives and archival sy...
Materials are either enabler or bottleneck for the vast majority of technological innovations. The digitization of materials and processes is mandatory to create live production environments which represent physical entities and their aggregations and thus allow to represent, share, and understand materials changes. However, a common standard forma...
The field of Materials Science is concerned with, e.g., properties and performance of materials. An important class of materials are crystalline materials that usually contain ``dislocations'' -- a line-like defect type. Dislocation decisively determine many important materials properties. Over the past decades, significant effort was put into unde...
The First International Workshop on Enabling Data-Driven Decisions from Learning
on the Web (L2D 2021) was held as part of the 14th ACM International Conference on Web Search and Data Mining (WSDM 2021) on March 12, 2021. The workshop collected novel, original research on the state of the art of online education empowered with data mining and machi...
The field of Materials Science is concerned with, e.g., properties and performance of materials. An important class of materials are crystalline materials that usually contain “dislocations" { a line-like defect type. Dislocation decisively determine many important materials properties. Over the past decades, significant effort was put into underst...
The design and delivering of platforms for online education is fostering increasingly intense research. Scaling up education online brings new emerging needs related with hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely, as examples. However , with the impressive progress of the data m...
Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning a...
The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the ex...
A huge number of scholarly articles published every day in different domains makes it hard for the experts to organize and stay updated with the new research in a particular domain. This study gives an overview of a new approach, HierClasSArt, for knowledge aware hierarchical classification of the scholarly articles for mathematics into a predefine...
Today, increasing numbers of people are interacting online and a lot of textual comments
are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in gener...
The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to hel...
Cultural heritage institutions store and digitize large amounts of multimedia data inside archives to make archival records findable by archivists, scientists, and general public. Cataloging standards vary from archive to archive and, therefore, the sharing and use of this data are limited. To solve this issue, linked open data (LOD) is rising as a...
An important part in European cultural identity relies on European cities and in particular on their histories and cultural heritage. Nuremberg, the home of important artists such as Albrecht Dürer and Hans Sachs developed into the epitome of German and European culture already during the Middle Ages. Throughout history, the city experienced a numb...
The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to hel...
Scientific knowledge has been traditionally disseminated and preserved through research articles published in journals, conference proceedings , and online archives. However, this article-centric paradigm has been often criticized for not allowing to automatically process, categorize , and reason on this knowledge. An alternative vision is to gener...
Document exploration in archives is often challenging due to the lack of organization in topic-based categories. Moreover, archival records only provide short text which is often insufficient for capturing the semantic. This paper proposes and explores a dataless categorization approach that utilizes word embeddings and TF-IDF to categorize archiva...
Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of the patient's health state. All these data can be analyzed and employed to cater to novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Lea...
With tons of healthcare reviews being collected online, finding helpful opinions among this collective intelligence is becoming harder. Existing literature in this domain usually tackled helpfulness prediction with machine-learning models optimized for binary classification. While they can filter out a subset of reviews, users might be still overwh...
Recent developments and advancements in several areas of Computer Science such as Semantic Web, Natural Language Understanding, Knowledge Representation, and more in general Artificial Intelligence have enabled to develop automatic and smart systems able to address various challenges and tasks. In this paper, we present a scalable and flexible huma...
In this paper, we present a preliminary approach that uses a set of NLP and Deep Learning methods for extracting entities and relationships from research publications and then integrates them in a Knowledge Graph. More specifically, we (i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing...
Social media are providing the humus for the sharing of knowledge and experiences and the growth of community activities (e.g., debating about different topics). The analysis of the user-generated content in this area usually relies on Sentiment Analysis. Word embeddings and Deep Learning have attracted extensive attention in various sentiment dete...
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backb...
During the last decades, a huge amount of data have been collected in clinical databases in the form of medical reports, laboratory results, treatment plans, etc., representing patients health status. Hence, digital information available for patient-oriented decision making has increased drastically but it is often not mined and analyzed in depth s...
Knowledge graphs (KG) are large networks of entities and relationships, typically expressed as RDF triples, relevant to a specific domain or an organization. Scientific Knowledge Graphs (SKGs) focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and ci...
Background:
Networks whose nodes have labels can seem complex. Fortunately, many have substructures that occur often ("motifs"). A societal example of a motif might be a household. Replacing such motifs by named supernodes reduces the complexity of the network and can bring out insightful features. Doing so repeatedly may give hints about higher l...
With the proliferation in number and scale of online courses, several challenges have emerged in supporting stakeholders during their delivery and fruition. Machine Learning and Semantic Analysis can add value to the underlying online environments in order to overcome a subset of such challenges (e.g. classification, retrieval, and recommendation)....
Moving towards the next generation of personalized learning environments requires intelligent approaches powered by analytics for advanced learning contexts with enriched digital content. Micro-Learning through Massive Open Online Courses is riding the wave of popularity as a novel paradigm for delivering short educational videos in small pre-organ...
Multi-class classification aims at assigning each sample to one category chosen among a set of different options. In this paper, we present our work for the development of a novel system for multi-class classification of e-learning videos based on the covered educational subjects. The audio transcripts and the text depicted into visual frames are e...
Complex network analysis is being applied on topological models of ecological networks, to extrapolate their advanced properties and as part of the activity of land management. Commonly employed methods tend to focus on single target species. This is satisfactory for cognitive analysis, but the limited view provided by these models results in a lac...
Complex network analysis is rising as an essential tool to understand properties of ecological landscape networks, and as an aid to land management. The most common methods to build graph models of ecological networks are based on representing functional connectivity with respect to a target species. This has provided good results, but the lack of...