January 2024
·
3 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
January 2024
·
3 Reads
March 2023
·
35 Reads
·
11 Citations
June 2022
·
20 Reads
·
7 Citations
Proceedings of the AAAI Conference on Artificial Intelligence
Web applications and services are increasingly important in a distributed internet filled with diverse cloud services and applications, each of which enable the completion of narrowly defined tasks. Given the explosion in the scale and diversity of such services, their composition and integration for achieving complex user goals remains a challenging task for end-users and requires a lot of development effort when specified by hand. We present a demonstration of the Goal Oriented Flow Assistant (GOFA) system, which provides a natural language solution to generate workflows for application integration. Our tool is built on a three-step pipeline: it first uses Abstract Meaning Representation (AMR) to parse utterances; it then uses a knowledge graph to validate candidates; and finally uses an AI planner to compose the candidate flow. We provide a video demonstration of the deployed system as part of our submission.
May 2021
·
6 Reads
·
7 Citations
Proceedings of the AAAI Conference on Artificial Intelligence
Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent developments in machine learning and in particular deep learning have opened the way to more general and efficient solutions to data-integration tasks. In this paper, we demonstrate an approach that allows modeling and integrating entities by leveraging their relations and contextual information. This is achieved by combining siamese and graph neural networks to effectively propagate information between connected entities and support high scalability. We evaluated our approach on the task of integrating data about business entities, demonstrating that it outperforms both traditional rule-based systems and other deep learning approaches.
May 2021
·
18 Reads
Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent developments in machine learning and in particular deep learning have opened the way to more general and efficient solutions to data-integration tasks. In this paper, we demonstrate an approach that allows modeling and integrating entities by leveraging their relations and contextual information. This is achieved by combining siamese and graph neural networks to effectively propagate information between connected entities and support high scalability. We evaluated our approach on the task of integrating data about business entities, demonstrating that it outperforms both traditional rule-based systems and other deep learning approaches.
February 2021
·
463 Reads
Knowledge graph embedding methods learn embeddings of entities and relations in a low dimensional space which can be used for various downstream machine learning tasks such as link prediction and entity matching. Various graph convolutional network methods have been proposed which use different types of information to learn the features of entities and relations. However, these methods assign the same weight (importance) to the neighbors when aggregating the information, ignoring the role of different relations with the neighboring entities. To this end, we propose a relation-aware graph attention model that leverages relation information to compute different weights to the neighboring nodes for learning embeddings of entities and relations. We evaluate our proposed approach on link prediction and entity matching tasks. Our experimental results on link prediction on three datasets (one proprietary and two public) and results on unsupervised entity matching on one proprietary dataset demonstrate the effectiveness of the relation-aware attention.
March 2020
·
25 Reads
·
2 Citations
Communications in Computer and Information Science
Today’s enterprise decision making relies heavily on insights derived from vast amounts of data from different sources. To acquire these insights, the available data must be cleaned, integrated and linked. In this work, we focus on the problem of linking records that contain textual descriptions of IT products.
December 2019
·
118 Reads
·
18 Citations
Record linkage is an essential part of nearly all real-world systems that consume structured and unstructured data coming from different sources. Typically no common key is available for connecting records. Massive data integration processes often have to be completed before any data analytics and further processing can be performed. In this work we focus on company entity matching, where company name, location and industry are taken into account. Our contribution is a highly scalable, enterprise-grade end-to-end system that uses rule-based linkage algorithms in combination with a machine learning approach to account for short company names. Linkage time is greatly reduced by an efficient decomposition of the search space using MinHash. Based on real-world ground truth datasets, we show that our approach reaches a recall of 91% compared to 73% for baseline approaches, while scaling linearly with the number of nodes used in the system.
July 2019
·
104 Reads
Record Linkage is an essential part of almost all real-world systems that consume data coming from different sources, structured and unstructured. Typically no common key is available in order to connect the records. Often massive data cleaning and data integration processes have to be completed before any data analytics and further processing can be performed. Though record linkage is often seen as a somewhat tedious necessary step, it is able to reveal valuable insights of the data at hand. These insights guide further analytic approaches over the data and support data visualization. In this work we focus on company entity matching, where company name, location and industry are taken into account. The matching is done on the fly to accommodate realtime processing of streamed data. Our contribution is a system that uses rule-based matching algorithms for scoring operations which we extend with a machine learning approach to account for short company names. We propose an end-to-end highly scalable enterprise-grade system. Linkage time is greatly reduced by efficient decomposition of the search space using MinHash. High linkage accuracy is reached by the proposed thorough scoring process of the matching candidates. Based on two real world ground truth datasets, we show that our approach reaches a recall of 91% compared to 86% for baseline approaches. These results are achieved while scaling linearly with the number of nodes used in the system.
... Our work therefore simulates the use of xAI for explanatory debugging [19,22] with concept-based explanations [21], also called the "glitch detector task" [41,43]. We investigate how xAI may improve people's mental models for AI [2,10], and how personalized xAI will affect people's ability to accurately identify when their assistant is correct or incorrect (i.e., if the agent adapts to the user, will the user make fewer mistakes?). Our contributions include: ...
March 2023
... Existing literature suggests that supervised learning methods applied to record linkage provide superior results than those from unsupervised methods, such as K-means (e.g., Christen 2012). Recent advances include combining graph convolutional networks and siamese networks to leverage the relationships and contextual information in knowledge graphs (Krivosheev et al. 2021). Nevertheless, supervised approaches require a substantial volume of high-quality training data, which might be difficult and too expensive to obtain in practice, as the data distribution is highly unbalanced towards negative (non-matches) pairs. ...
May 2021
Proceedings of the AAAI Conference on Artificial Intelligence
... Recently, there have been several applications in which first multiple plans are generated and then the users are involved in the selection process. Some of these applications are in the area of patient monitoring , enterprise risk management , conversational systems (Chakraborti et al. 2022;Rizk et al. 2020;Sreedharan et al. 2020b), and web service composition (Brachman et al. 2022). However, the user interfaces for interacting with such systems has received little attention. ...
June 2022
Proceedings of the AAAI Conference on Artificial Intelligence
... RL is in charge of joining various representations of the same entity (e.g., a company, an organization, a product, etc.) residing in structured records coming from different datasets [23]. Record linkage (RL) has been extensively studied in recent decades. ...
Reference:
Fast Record Linkage for Company Entities
March 2020
Communications in Computer and Information Science
... MinHash algorithm, when used with the LSH forest data structure, represents a text similarity method that approximates the Jaccard set similarity score [32] MinHash was used to replace the large sets of string data with smaller "signatures" that still preserve the underlying similarity metric, hence producing a signature matrix, but a pair-wise signature comparison was still needed. Here the LSH Forest comes into play. ...
December 2019