
Feiyu XuLenovo · Lenovo Research
Feiyu Xu
PD PhD habil.
About
118
Publications
36,197
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,552
Citations
Citations since 2017
Introduction
Feiyu Xu is Vice President of Lenovo Group, and Head of Lenovo Research AI Lab.
Feiyu Xu was Principal Researcher, head of text analytics research group at language technology lab at DFKI and DFKI Research Fellow. She also is co-founder of Yocoy Technologies GmbH. She is vice-director of the Joint Research Lab for Language Technology of Shanghai Jiao Tong University and Saarland University. Her research areas: big text data analytics, big data technologies, information extraction, semantic web, question answering, opinion mining and mobile applications.
Publications
Publications (118)
The drastic increase of user-generated contents has exhibited a rich source for mining opinions. Unfortunately, the quality of user-generated content varies significantly from excellent to meaningless, which by general estimation, causes a great deal of difficulty in mining-related applications. In the field of low-quality review detection, many pr...
Unsupervised text style transfer is full of challenges due to the lack of parallel data and difficulties in content preservation. In this paper, we propose a novel neural approach to unsupervised text style transfer, which we refer to as Cycle-consistent Adversarial autoEncoders (CAE) trained from non-parallel data. CAE consists of three essential...
With the advancement of Artificial Intelligence (AI), algorithms brings more fairness challenges in ethical, legal, psychological and social levels. People should start to face these challenges seriously in dealing with AI products and AI solutions. More and more companies start to recognize the importance of Diversity and Inclusion (D&I) due to AI...
Automatically generating Couinaud segments on liver, a prerequisite for modern surgery of the liver, from computed tomography (CT) volumes is a challenge for the computer-aided diagnosis (CAD). In this paper, we propose a novel global and local contexts UNet (GLC-UNet) for Couinaud segmentation. In this framework, intra-slice features and 3D contex...
Liver lesion detection on abdominal computed tomography (CT) is a challenging topic because of its large variance. Current detection methods based on a 2D convolutional neural network (CNN) are limited by the inconsistent view of lesions. One obvious observation is that it can easily lead to a discontinuity problem since it ignores the information...
Many healthcare applications would significantly benefit from the processing and analyzing of multi-modal data. In this paper, we propose a novel multi-task, multi-modal, and multi-attention framework to learn and align information from multiple medical sources. Based on experiments on a public medical dataset, we show that combining features from...
Previous works on meta-learning either relied on elaborately hand-designed network structures or adopted specialized learning rules to a particular domain. We propose a universal framework to optimize the meta-learning process automatically by adopting neural architecture search technique (NAS). NAS automatically generates and evaluates meta-learne...
Smart service chatbot, aiming to provide efficient, reliable and natural customer service, has grown rapidly in recent years. The understanding of human-agent conversation, especially modeling the conversational behavior, is essential to enhance the machine intelligence during the customer-chatbot interaction. However, there is a gap between qualit...
Deep learning has made significant contribution to the recent progress in artificial intelligence. In comparison to traditional machine learning methods such as decision trees and support vector machines, deep learning methods have achieved substantial improvement in various prediction tasks. However, deep neural networks (DNNs) are comparably weak...
Previous works on meta-learning either relied on elaborately hand-designed network structures or adopted specialized learning rules to a particular domain. We propose a universal framework to optimize the meta-learning process automatically by adopting neural architecture search technique (NAS). NAS automatically generates and evaluates meta-learne...
Human agents in technical customer support provide users with instructional answers to solve a task that would otherwise require a lot of time, money, energy, physical costs. Developing a dialogue system in this domain is challenging due to the broad variety of user questions. Moreover, user questions are noisy (for example, spelling mistakes), red...
Open Information Extraction systems, such as ReVerb, OLLIE, Clause IE, OpenIE 4.2, Sanford OIE, and PredPatt, have attracted much attention on English OIE. However, few studies have been reported on OIE for languages beyond English. This paper presents a Chinese OIE system PLCOIE to extract binary relation triples and N-ary relation tuples from Chi...
People use different words when expressing their opinions. Sentiment analysis as a way to automatically detect and categorize people’s opinions in text, needs to reflect this diversity and individuality. One possible approach to analyze such traits is to take a person’s past opinions into consideration. In practice, such a model can suffer from the...
Automatically generating diagnostic reports with interpretability for computed tomography (CT) volumes is a new challenge for the computer-aided diagnosis (CAD). In this paper, we propose a novel multimodal data and knowledge linking framework between CT volumes and textual reports with a semi-supervised attention mechanism. This multimodal framewo...
Human agents in technical customer support provide users with instructional answers to solve a task. Developing a technical support question answering (QA) system is challenging due to the broad variety of user intents. Moreover, user questions are noisy (for example, spelling mistakes), redundant and have various natural language expresses, which...
Die meisten Gesprachsversuche mit Migranten, die kein Deutsch oder Englisch sprechen, enden mit Handen und Füsen - und Frust. Das Deutsche Forschungszentrum für Kunstliche Intelligenz (DFKI) hat in Zusammenarbeit mit seiner Spin-off Firma Yocoy eine App entwickelt, die Immigranten aus arabischen Landern den Dialog beispielsweise mit Behorden, auf d...
MACSS (Medical All-round Care Service Solutions) ist ein Projekt mit dem Ziel, eine patientenzentrierte Smart-Health-Service-Plattform zur Verbesserung der Patientensicherheit nach einer Nierentransplantation zu entwickeln. Neben der Uberwachung der Arzneimittelsicherheit, soll die Kommunikation zwischen Arzt und Patient, aber auch unter allen beha...
This paper presents a new annotated corpus of 513 anonymized radiology reports written in Spanish. Reports were manually annotated with entities, negation and uncertainty terms and relations. The corpus was conceived as an evaluation resource for named entity recognition and relation extraction algorithms, and as input for the use of supervised met...
An important subtask in clinical text mining tries to identify whether a clinical finding is expressed as present, absent or unsure in a text. This work presents a system for detecting mentions of clinical findings that are negated or just speculated. The system has been applied to two different types of German clinical texts: clinical notes and di...
In this work we present a fine-grained annotation schema to detect named entities in German clinical data of chronically ill patients with kidney diseases. The annotation schema is driven by the needs of our clinical partners and the linguistic aspects of German language. In order to generate annotations within a short period, the work also present...
In relation extraction, a key process is to obtain good detectors that find relevant sentences describing the target relation. To minimize the necessity of labeled data for refining detectors, previous work successfully made use of BabelNet, a semantic graph structure expressing relationships between synsets, as side information or prior knowledge....
Recent years have seen a significant growth and increased usage of large-scale knowledge resources in both academic research and industry. We can distinguish two main types of knowledge resources: those that store factual information about entities in the form of semantic relations (e.g., Freebase), namely so-called knowledge graphs, and those that...
Recent research shows the importance of linking linguistic knowledge resources for the creation of large-scale linguistic data. We describe our approach for combining two English resources, FrameNet and sar-graphs, and illustrate the benefits of the linked data in a relation extraction setting. While FrameNet consists of schematic representations o...
The task of relation extraction is to recognize and extract relations between entities or concepts in texts. Dependency parse trees have become a popular source for discovering extraction patterns, which encode the grammatical relations among the phrases that jointly express relation instances. State-of-the-art weakly supervised approaches to relat...
Coreference resolution for event mentions enables extraction systems to process document-level information. Current systems in this area base their decisions on rich semantic features from various knowledge bases, thus restricting them to domains where such external sources are available. We propose a model for this task which does not rely on such...
Patterns extracted from dependency parses of sentences are a major source of knowledge for most state-of-the-art relation extraction systems, but can be of low quality in distantly supervised settings. We present a linguistic annotation tool that allows human experts to analyze and categorize automatically learned patterns, and to identify common e...
A new method is proposed and evaluated that improves distantly supervised learning of pattern rules for n-ary relation extraction. The new method employs knowledge from a large lexical semantic repository to guide the discovery of patterns in parsed relation mentions. It extends the induced rules to semantically relevant material outside the minima...
In this paper, we present a novel approach to joint word sense disambiguation (WSD) and entity linking (EL) that combines a set of complementary objectives in an extensible multi-objective formalism. During disambiguation the system performs continuous optimization to find optimal probability distributions over candidate senses. The performance of...
In the THESEUS Alexandria use case, information extraction (IE) has been intensively applied to extract facts automatically from unstructured documents, such as Wikipedia and online news, in order to construct ontology-based knowledge databases for advanced information access. In addition, IE is also utilized for analyzing natural language queries...
In this paper, we present a novel combination of two types of language resources dedicated to the detection of relevant relations (RE) such as events or facts across sentence boundaries. One of the two resources is the sar-graph, which aggregates for each target relation ten thousands of linguistic patterns of semantically associated relations that...
This paper presents a new resource for the training and evaluation needed by relation extraction experiments. The corpus consists of annotations of mentions for three semantic relations: marriage, parent–child, siblings, selected from the domain of biographic facts about persons and their social relationships. The corpus contains more than one hund...
The article demonstrates how generic parsers in a minimally supervised information extraction framework can be adapted to
a given task and domain for relation extraction (RE). For the experiments, two parsers that deliver n-best readings are included: (1) a generic deep-linguistic parser (PET) with a largely hand-crafted head-driven phrase structur...
Yochina is a mobile application for crosslingual and cross-cultural understanding. The core of the demonstrated app supports dialogues between English and Chinese and German and Chinese. The dialogue facility is connected with interactive language guides, culture guides and country guides. The app is based on a generic framework enabling such novel...
Web-scale relation extraction is a means for building and extending large repositories of formalized knowledge. This type of automated knowledge building requires a decent level of precision, which is hard to achieve with automatically acquired rule sets learned from unlabeled data by means of distant or minimal supervision. This paper shows how pr...
We present a large-scale domain-adaptive relation extraction (RE) system, which learns grammar-based RE rules from the Web by utilizing large numbers of known relation instances as seed. The system does not only detect binary but also nary relations such as events. Our goal is to discover rule sets large enough for the actual range of linguistic va...
The coexistence of Western and Eastern languages and cultures poses a true challenge for global mobility and communication in business and personal life. In this paper, we will describe mobile software applications that are built on top of multilingual and crosslingual technologies for overcoming language barriers, e.g., between English and Chinese...
This paper presents a system that builds on theoretical and experimental insights from linguistic pragmatics, uses novel techniques from computational linguistics and combines them with robust baseline technologies to provide intelligent Non Player Characters (NPCs), which naturally act and talk in a virtual world. Current NPCs still lack the neces...
In this paper, we propose to use dependency graphs rather than trees as the interface between a parser and the rule acquisition module of a relation extraction (RE) system. Dependency graphs are much more expressive than trees and can easily be adapted to the output representations of various parsers, in particular those with richer semantics. Our...
In this paper we present an information service system that allows users to search for the key players of requested technology areas and for their collaboration networks. This system utilizes information extraction and wrapper technologies for detecting persons, organizations, publications and patents as well as relationships among them. Furthermor...
In virtual worlds of multi-user online games such as World of Warcraft and social platforms such as Second Life Non-Player Characters (NPCs) have become an essential element. NPCs moderate the game plot, make the artificial world more
vivid and create an immersive environment. They are also necessary to populate new worlds which otherwise would be...
The paper demonstrates how the generic parser of a minimally supervised information extraction framework can be adapted to a given task and domain for relation extraction (RE). For the experiments a generic deep-linguistic parser was employed that works with a largely hand-crafted head-driven phrase structure grammar (HPSG) for English. The output...
This paper demonstrates a web-based online system, called META-DARE1. META-DARE is built to assist researchers to obtain insights into seed-based minimally supervised machine learning for relation extraction. META-DARE allows researchers and students to conduct experiments with an existing machine learning system called DARE (Xu et al., 2007). User...
This paper investigates the application of an existing seed-based minimally supervised learning algorithm to different social domains exhibiting different properties of the available data. A systematic analysis studies the respective data properties of the three domains including the distribution of the semantic arguments and their combinations. Th...
This paper describes the KomParse system, a natural-language dialog system in the three-dimensional virtual world Twinity. In order to fulfill the various communication demands between nonplayer characters (NPCs) and users in such an online virtual world, the system realizes a flexible and hybrid approach combining knowledge-intensive domain-specif...
This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly...
After several years of development, the vision of the Semantic Web is gradually becoming reality. Large data repositories have been created and offer semantic information in a machine-processable form for various domains. Semantic Web data can be published on the Web, gathered automatically, and reasoned about. All these developments open interesti...
This paper presents a new approach to improving relation extraction based on minimally supervised learning. By adding some limited closed-world knowledge for confidence estimation of learned rules to the usual seed data, the precision of relation extraction can be considerably improved. Starting from an existing baseline system we demonstrate that...
This paper presents a novel approach to dialogue act recognition employing multilevel information features. In addition to features such as context information and words in the utterances, the recognition task utilizes syntactic and semantic relations acquired by information extraction methods. These features are utilized by a Bayesian network clas...
This paper presents a novel approach to a self-learning agent who collects and learns new knowledge from the web and exchanges
her knowledge via dialogues with the users. The application domain is gossip about celebrities in the music world. The agent
can inform herself and update the acquired knowledge by observing the web. Fans of musicians can a...
The main contribution of this paper is a systematic analysis of a minimally supervised machine learning method for relation
extraction grammars. The method is based on a bootstrapping approach in which the bootstrapping is triggered by semantic seeds.
The starting point of our analysis is the pattern-learning graph which is a subgraph of the bipart...
This paper describes a self-learning software agent who collects and learns knowledge from the web and also exchanges her knowledge via dialogues with the users. The agent is built on top of information extraction, web mining, question answering and dialogue system technologies, and users can freely formulate their questions within the gossip domai...
This paper presents a novel approach to a self-learning agent who collects and learns new knowledge from the web and exchanges her knowledge via dialogues with the users. The application domain is gossip about celebrities in the music world. The agent can inform herself and update the acquired knowledge by observing the web. Fans of musicians can a...
This paper describes a self-learning software agent who collects and learns knowledge from the web and also exchanges her knowledge via dialogues with the users. The agent is built on top of information extraction, web mining, question answering and dialogue system technologies, and users can freely formulate their questions within the gossip domai...
This paper presents OMINE, an opinion mining system which aims to identify concepts such as products and their attributes, and analyze their corresponding polarities. Our work pioneers at linking extracted topic terms with dom