Davide Mottin

Davide Mottin
  • Professor (Associate) at Aarhus University

About

72
Publications
6,285
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
991
Citations
Current institution
Aarhus University
Current position
  • Professor (Associate)

Publications

Publications (72)
Preprint
Full-text available
Recent advancements in AI for biological research focus on integrating molecular data with natural language to accelerate drug discovery. However, the scarcity of high-quality annotations limits progress in this area. This paper introduces LA$^3$, a Language-based Automatic Annotation Augmentation framework that leverages large language models to a...
Article
Full-text available
Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique to determine the electronic structure of solids. Advances in light sources for ARPES experiments are currently leading to a vast increase of data acquisition rates and data quantity. On the other hand, access time to the most advanced ARPES instruments remains st...
Preprint
Large Language Models (LLMs) have shown remarkable capabilities in processing various data structures, including graphs. While previous research has focused on developing textual encoding methods for graph representation, the emergence of multimodal LLMs presents a new frontier for graph comprehension. These advanced models, capable of processing b...
Preprint
Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique to determine the electronic structure of solids. Advances in light sources for ARPES experiments are currently leading to a vast increase of data acquisition rates and data quantity. On the other hand, access time to the most advanced ARPES instruments remains st...
Preprint
Full-text available
Entity summarization aims to compute concise summaries for entities in knowledge graphs. Existing datasets and benchmarks are often limited to a few hundred entities and discard graph structure in source knowledge graphs. This limitation is particularly pronounced when it comes to ground-truth summaries, where there exist only a few labeled summari...
Article
Full-text available
Angle-resolved photoemission spectroscopy (ARPES) is a technique used to map the occupied electronic structure of solids. Recent progress in x-ray focusing optics has led to the development of ARPES into a microscopic tool, permitting the electronic structure to be spatially mapped across the surface of a sample. This comes at the expense of a time...
Chapter
Text-image retrieval (T2I) refers to the task of recovering all images relevant to a keyword query. Popular datasets for text-image retrieval, such as Flickr30k, VG, or MS-COCO, utilize annotated image captions, e.g., “a man playing with a kid”, as a surrogate for queries. With such surrogate queries, current multi-modal machine learning models, su...
Preprint
Full-text available
Angle-resolved photoemission spectroscopy (ARPES) is a technique used to map the occupied electronic structure of solids. Recent progress in X-ray focusing optics has led to the development of ARPES into a microscopic tool, permitting the electronic structure to be spatially mapped across the surface of a sample. This comes at the expense of a time...
Article
Full-text available
Community detection finds homogeneous groups of nodes in a graph. Existing approaches either partition the graph into disjoint, non-overlapping, communities, or determine only overlapping communities. To date, no method supports both detections of overlapping and non-overlapping communities. We propose UCoDe, a unified method for community detectio...
Preprint
Proposed as a solution to the inherent black-box limitations of graph neural networks (GNNs), post-hoc GNN explainers aim to provide precise and insightful explanations of the behaviours exhibited by trained GNNs. Despite their recent notable advancements in academic and industrial contexts, the robustness of post-hoc GNN explainers remains unexplo...
Preprint
Self-explainable deep neural networks are a recent class of models that can output ante-hoc local explanations that are faithful to the model's reasoning, and as such represent a step forward toward filling the gap between expressiveness and interpretability. Self-explainable graph neural networks (GNNs) aim at achieving the same in the context of...
Conference Paper
TSNE and UMAP are popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. Despite their popularity, however, little work has been done to study their full span of differences. We theoretically and experimentally evaluate the space of parameters in the TSNE and UMAP algorithms and observe that a s...
Preprint
Full-text available
tSNE and UMAP are popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. Despite their popularity, however, little work has been done to study their full span of differences. We theoretically and experimentally evaluate the space of parameters in both tSNE and UMAP and observe that a single one...
Article
How can we efficiently and scalably cluster high-dimensional data? The k -means algorithm clusters data by iteratively reducing intra-cluster Euclidean distances until convergence. While it finds applications from recommendation engines to image segmentation, its application to high-dimensional data is hindered by the need to repeatedly compute Euc...
Preprint
Full-text available
The integration of Artificial Intelligence (AI) into the field of drug discovery has been a growing area of interdisciplinary scientific research. However, conventional AI models are heavily limited in handling complex biomedical structures (such as 2D or 3D protein and molecule structures) and providing interpretations for outputs, which hinders t...
Preprint
We introduce a spectral notion of graph complexity derived from the Weyl's law. We experimentally demonstrate its correlation to how well the graph can be embedded in a low-dimensional Euclidean space.
Article
What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare...
Preprint
TSNE and UMAP are two of the most popular dimensionality reduction algorithms due to their speed and interpretable low-dimensional embeddings. However, while attempts have been made to improve on TSNE's computational complexity, no existing method can obtain TSNE embeddings at the speed of UMAP. In this work, we show that this is indeed possible by...
Preprint
Full-text available
Community detection is the unsupervised task of finding groups of nodes in a graph based on mutual similarity. Existing approaches for community detection either partition the graph in disjoint, non-overlapping, communities, or return overlapping communities. Currently, no method satisfactorily detects both overlapping and non-overlapping communiti...
Chapter
What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Existing solutions either require auxiliary information such as node attributes, or provide a single-scale view of the graph by translating the problem into alig...
Preprint
What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare...
Article
Full-text available
Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signifies importance. In real life, there are many graphs with weights on nodes and/or edges. For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, b...
Preprint
Full-text available
Analytical queries over RDF data are becoming prominent as a result of the proliferation of knowledge graphs. Yet, RDF databases are not optimized to perform such queries efficiently, leading to long processing times. A well known technique to improve the performance of analytical queries is to exploit materialized views. Although popular in relati...
Preprint
We introduce Explearn, an online algorithm that learns to jointly output predictions and explanations for those predictions. Explearn leverages Gaussian Processes (GP)-based contextual bandits. This brings two key benefits. First, GPs naturally capture different kinds of explanations and enable the system designer to control how explanations genera...
Article
Low-dimensional representations, or embeddings , of a graph's nodes facilitate several practical data science and data engineering tasks. As such embeddings rely, explicitly or implicitly, on a similarity measure among nodes, they require the computation of a quadratic similarity matrix, inducing a tradeoff between space complexity and embedding qu...
Preprint
Full-text available
Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are no...
Article
Knowledge graphs (KGs) represent facts in the form of subject-predicate-object triples and are widely used to represent and share knowledge on the Web. Their ability to represent data in complex domains augmented with semantic annotations has attracted the attention of both research and industry. Yet, their widespread adoption in various domains an...
Preprint
Full-text available
Low-dimensional representations, or embeddings, of a graph's nodes facilitate data mining tasks. Known embedding methods explicitly or implicitly rely on a similarity measure among nodes. As the similarity matrix is quadratic, a tradeoff between space complexity and embedding quality arises; past research initially opted for heuristics and linear-t...
Conference Paper
Can a system discover what a user wants without the user explicitly issuing a query? A recommender system proposes items of potential interest based on past user history. On the other hand, active search incites, and learns from, user feedback, in order to recommend items that meet a user's current tacit interests, hence promises to offer up-to-dat...
Conference Paper
Full-text available
Exploration is one of the primordial ways to accrue knowledge about the world and its nature. As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand. In this context, exploratory search provides a handy tool for progressively gather the necessary knowledge by starting...
Conference Paper
Full-text available
Exploration is one of the primordial ways to accrue knowledge about the world and its nature. As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand. In this context exploratory search provides a handy tool for progressively gather the necessary knowledge by starting...
Preprint
Full-text available
Generative models are often used to sample high-dimensional data points from a manifold with small intrinsic dimension. Existing techniques for comparing generative models focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop the first, to our knowledge, intrinsic and multi-scale me...
Conference Paper
Full-text available
Sadly, an empty abstract.
Chapter
The approaches presented so far are tailored to the three main data models, from fully structured data (relational), to semi-structured (graphs) and unstructured (textual) data. In the implementation of an exploratory system, the natural choice of the specific example-based technique to employ depends on the data model and the search task. However,...
Chapter
Exploratory search is the process of investigating an unfamiliar domain. As mentioned at the beginning of this book, in recent years, we have witnessed a growing importance of exploratory search. We produce data at unprecedented scale and speed and in heterogeneous and uncoordinated ways. This deluge of information renders the understanding of the...
Chapter
One look at relational data and the scenario resembles those offices with vast piles of documents organized in archives. The documents in our analogy illustrate the tuples and the archives the tables in a relational database. Relational databases superseded paper archives allowing the convenient categorization of a significant amount of information...
Chapter
In this chapter, we survey techniques for identifying elements and structures of interest within a large graph.
Chapter
In this section we move our attention to repositories of unstructured data, i.e., documents. This includes repositories of scientific papers, documents crawled from the Web, forum posts, social media postings, and other similar collections of textual-data.
Chapter
Until now we have considered users as mere spectators of the algorithmic process, while their preferences influence the result of the algorithm. The algorithms in previous chapters assume fixed user preferences or refinement criteria based on hardwired rules. Even in those approaches that allow for an interactive work-flow, the system can infer onl...
Preprint
Full-text available
Representing a graph as a vector is a challenging task; ideally, the representation should be easily computable and conducive to efficient comparisons among graphs, tailored to the particular data and analytical task at hand. Unfortunately, a "one-size-fits-all" solution is unattainable, as different analytical tasks may require different attention...
Article
Exploring knowledge graphs can be a daunting task for any user, expert or novice. This is due to the complexity of the schema or because they are unfamiliar with the contents of the data, or even because they do not know precisely what they are looking for. For the same reason there is a significant demand for exploratory methods for this kind of d...
Preprint
Full-text available
Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unf...
Conference Paper
We present MetaExp, a system that assists the user during the exploration of large knowledge graphs, given two sets of initial nodes. At its core, MetaExp presents a small set of meta-paths to the user, which are sequences of relationships among node types. Such meta-paths do not overwhelm the user with complex structures, yet they preserve semanti...
Article
Full-text available
Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from words to graphs, without defining a clearly comprehensible graph-related objective. Yet, as we show...
Preprint
Full-text available
Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from words to graphs, without defining a clearly comprehensible graph-related objective. Yet, as we show...
Article
Query answering routinely employs knowledge graphs to assist the user in the search process. Given a knowledge graph that represents entities and relationships among them, one aims at complementing the search with intuitive but effective mechanisms. In particular, we focus on the comparison of two or more entities and the detection of unexpected, s...
Article
Full-text available
Data usually comes in a plethora of formats and dimensions, rendering the exploration and information extraction processes cumbersome. Thus, being able to cast exploratory queries in the data with the intent of having an immediate glimpse on some of the data properties is becoming crucial. An exploratory query should be simple enough to avoid compl...
Conference Paper
The increasing interest in social networks, knowledge graphs, protein-interaction, and many other types of networks has raised the question how users can explore such large and complex graph structures easily. Current tools focus on graph management, graph mining, or graph visualization but lack user-driven methods for graph exploration. In many ca...
Article
Full-text available
Modern search engines employ advanced techniques that go beyond the structures that strictly satisfy the query conditions in an effort to better capture the user intentions. In this work, we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested. We call these queries exemplar queries....
Article
Full-text available
We propose a principled optimization-based interactive query relaxation framework for queries that return no answers. Given an initial query that returns an empty-answer set, our framework dynamically computes and suggests alternative queries with fewer conditions than those the user has initially requested, in order to help the user arrive at a qu...
Conference Paper
We study a problem of graph-query reformulation enabling explorative query-driven discovery in graph databases. Given a query issued by the user, the system, apart from returning the result patterns, also proposes a number of specializations (i.e., supergraphs) of the original query to facilitate the exploration of the results. We formalize the pro...
Article
Information graphs are generic graphs that model different types of information through nodes and edges. Knowledge graphs are the most common type of information graphs in which nodes represent entities and edges represent relationships among them. In this paper, we argue that exploitation of information graphs can lead into novel query answering c...
Article
We present IQR, a system that demonstrates optimization based interactive relaxations for queries that return an empty answer. Given an empty answer, IQR dynamically suggests one relaxation of the original query conditions at a time to the user, based on certain optimization objectives, and the user responds by either accepting or declining the rel...
Article
We demonstrate XQ, a query engine that implements a novel technique for searching relevant information on the web and in various data sources, called Exemplar Queries. While the traditional query model expects the user to provide a set of specifications that the elements of interest need to satisfy, XQ expects the user to provide only an element of...
Article
Search engines are continuously employing advanced techniques that aim to capture user intentions and provide results that go beyond the data that simply satisfy the query conditions. Examples include the personalized results, related searches, similarity search, popular and relaxed queries. In this work we introduce a novel query paradigm that con...
Article
Full-text available
Log information describing the items the users have selected from the set of answers a query engine returns to their queries constitute an excellent form of indirect user feedback that has been extensively used in the web to improve the effectiveness of search engines. In this work we study how the logs can be exploited to improve the ranking of th...
Article
We propose a principled optimization-based interactive query relaxation framework for queries that return no answers. Given an initial query that returns an empty answer set, our framework dynamically computes and suggests alternative queries with less conditions than those the user has initially requested, in order to help the user arrive at a que...

Network

Cited By