Minlie Huang

Tsinghua University, Peping, Beijing, China

Are you Minlie Huang?

Claim your profile

Publications (63)41.44 Total impact

  • Source
    Han Xiao · Minlie Huang · Yu Hao · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Knowledge graph embedding aims at offering a numerical paradigm for knowledge representation by translating the entities and relations into continuous vector space. This paper studies the problem of unsatisfactory precise knowledge embedding and attributes a new issue to this problem that \textbf{\textit{inaccuracy of truth characterization}}, indicating that existing methods could not express the true facts in a fine degree. To alleviate this issue, we propose the orbit-based embedding model, \textbf{OrbitE}. The new model is a well-posed algebraic system that expands the position of golden triples from one point in current models to a manifold. Extensive experiments show that the proposed model achieves substantial improvements against the state-of-the-art baselines, especially for precise prediction.
    Preview · Article · Dec 2015
  • Han Xiao · Minlie Huang · Yu Hao · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, knowledge graph embedding, which projects symbolic entities and relations into continuous vector space, has become a new, hot topic in artificial intelligence. This paper addresses a new issue of \textbf{multiple relation semantics} that a relation may have multiple meanings revealed by the entity pairs associated with the corresponding triples, and proposes a novel Gaussian mixture model for embedding, \textbf{TransG}. The new model can discover latent semantics for a relation and leverage a mixture of relation component vectors for embedding a fact triple. To the best of our knowledge, this is the first generative model for knowledge graph embedding, which is able to deal with multiple relation semantics. Extensive experiments show that the proposed model achieves substantial improvements against the state-of-the-art baselines.
    No preview · Article · Sep 2015
  • Han Xiao · Minlie Huang · Yu Hao · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Knowledge representation is a major topic in AI, and many studies attempt to represent entities and relations of knowledge base in a continuous vector space. Among these attempts, translation-based methods build entity and relation vectors by minimizing the translation loss from a head entity to a tail one. In spite of the success of these methods, translation-based methods also suffer from the oversimplified loss metric, and are not competitive enough to model various and complex entities/relations in knowledge bases. To address this issue, we propose \textbf{TransA}, an adaptive metric approach for embedding, utilizing the metric learning ideas to provide a more flexible embedding method. Experiments are conducted on the benchmark datasets and our proposed method makes significant and consistent improvements over the state-of-the-art baselines.
    No preview · Article · Sep 2015
  • Source
    Jun Feng · Mantong Zhou · Yu Hao · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Knowledge graph embedding refers to projecting entities and relations in knowledge graph into continuous vector spaces. State-of-the-art methods, such as TransE, TransH, and TransR build embeddings by treating relation as translation from head entity to tail entity. However, previous models can not deal with reflexive/one-to-many/many-to-one/many-to-many relations properly, or lack of scalability and efficiency. Thus, we propose a novel method, flexible translation, named TransF, to address the above issues. TransF regards relation as translation between head entity vector and tail entity vector with flexible magnitude. To evaluate the proposed model, we conduct link prediction and triple classification on benchmark datasets. Experimental results show that our method remarkably improve the performance compared with several state-of-the-art baselines.
    Preview · Article · May 2015
  • Source
    Biao Liu · Minlie Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: Prior knowledge has been shown very useful to address many natural language processing tasks. Many approaches have been proposed to formalise a variety of knowledge, however, whether the proposed approach is robust or sensitive to the knowledge supplied to the model has rarely been discussed. In this paper, we propose three regularization terms on top of generalized expectation criteria, and conduct extensive experiments to justify the robustness of the proposed methods. Experimental results demonstrate that our proposed methods obtain remarkable improvements and are much more robust than baselines.
    Preview · Article · Mar 2015
  • Lei Fang · Qiao Qian · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: For online reviews, sentiment explanations refer to the sentences that may suggest detailed reasons of sentiment, which are very important for applications in review mining like opinion summarization. In this paper, we address the problem of ranking sentiment explanations by formulating the process as two subproblems: sentence informativeness ranking and structural sentiment analysis. Tractable inference in joint prediction is performed through dual decomposition. Preliminary experiments on publicly available data demonstrate that our approach obtains promising performance.
    No preview · Article · Nov 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Automatic extraction of new words is an indispensable precursor to many NLP tasks such as Chinese word segmentation, named entity extraction, and sentiment analysis. This paper aims at extracting new sentiment words from large-scale user-generated content. We propose a fully unsupervised, purely data-driven framework for this purpose. We design statistical measures respectively to quantify the utility of a lexical pattern and to measure the possibility of a word being a new word. The method is almost free of linguistic resources (except POS tags), and requires no elaborated linguistic rules. We also demonstrate how new sentiment word will benefit sentiment analysis. Experiment results demonstrate the effectiveness of the proposed method.
    No preview · Conference Paper · Jun 2014
  • Lei Fang · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: In sentiment analysis, aspect-level review analysis has been an important task because it can catalogue, aggregate, or summarize various opinions according to a product's properties. In this paper, we explore a new concept for aspect-level review analysis, latent sentiment explanations, which are defined as a set of informative aspect-specific sentences whose polarities are consistent with that of the review. In other words, sentiment explanations best represent a review in terms of both aspect and polarity. We formulate the problem as a structure learning problem, and sentiment explanations are modeled with latent variables. Training samples are automatically identified through a set of pre-defined aspect signature terms (i.e., without manual annotation on samples), which we term the way weakly supervised. Our major contributions lie in two folds: first, we formalize the use of aspect signature terms as weak supervision in a structural learning framework, which remarkably promotes aspect-level analysis; second, the performance of aspect analysis and document-level sentiment classification are mutually enhanced through joint modeling. The proposed method is evaluated on restaurant and hotel reviews respectively, and experimental results demonstrate promising performance in both document-level and aspect-level sentiment analysis.
    No preview · Conference Paper · Oct 2013
  • Source
    Po Hu · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Patents are critically important for a company to protect its core business concepts and proprietary technologies. Effective patent mining in massive patent databases not only provides business enterprises with valuable insights to develop strategies for research and development, intellectual property management, and product marketing, but also helps patent offices to improve efficiency and optimize their patent examination processes. This paper describes the patent mining problem of automatically discovering core patents (i.e., novel and influential patents in a domain). In addition, the value of core patent mining is illustrated by revealing the potential competitive relationships among companies in their core patents. The work addresses the unique patent vocabulary usage which is not considered in traditional word-based statistical methods with a topic-based temporal mining approach that quantifies a patent??s novelty and influence through topic activeness variations. Tests of this method on real-world patent portfolios show the effectiveness of this approach over state-of-the-art methods.
    Preview · Article · Aug 2013 · Tsinghua Science & Technology
  • Hongwei Jin · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: More and more product reviews emerge on E-commerce sites and microblog systems nowadays. This information is useful for consumers to know the others’ opinion on the products before purchasing, or companies who want to learn the public sentiment of their products. In order to effectively utilize this information, this paper has done some sentiment analysis on these multi-source reviews. For one thing, a binary classification framework based on the aspects of product is proposed. Both explicit and implicit aspect is considered and multiple kinds of feature weighing and classifiers are compared in our framework. For another, we use several machine learning algorithms to classify the product reviews in microblog systems into positive, negative and neutral classes, and find OVA-SVMs perform best. Part of our work in this paper has been applied in a Chinese Product Review Mining System.
    No preview · Chapter · Jul 2012
  • Source
    Lei Fang · Minlie Huang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a structural learning model for joint sentiment classification and aspect analysis of text at various levels of granularity. Our model aims to identify highly informative sentences that are aspect-specific in online custom reviews. The primary advantages of our model are two-fold: first, it performs document-level and sentence-level sentiment polarity classification jointly; second, it is able to find informative sentences that are closely related to some respects in a review, which may be helpful for aspect-level sentiment analysis such as aspect-oriented summarization. The proposed method was evaluated with 9,000 Chinese restaurant reviews. Preliminary experiments demonstrate that our model obtains promising performance.
    Preview · Conference Paper · Jul 2012
  • Source
    Chong Long · Jie Zhang · Minlie Huang · Xiaoyan Zhu · Ming Li · Bin Ma
    [Show abstract] [Hide abstract]
    ABSTRACT: Most participatory web sites collect overall ratings (e.g., five stars) of products from their customers, reflecting the overall assessment of the products. However, it is more useful to present ratings of product features (such as price, battery, screen, and lens of digital cameras) to help customers make effective purchase decisions. Unfortunately, only a very few web sites have collected feature ratings. In this paper, we propose a novel approach to accurately estimate feature ratings of products. This approach selects user reviews that extensively discuss specific features of the products (called specialized reviews), using information distance of reviews on the features. Experiments on both annotated and real data show that overall ratings of the specialized reviews can be used to represent their feature ratings. The average of these overall ratings can be used by recommender systems to provide feature-specific recommendations that can better help users make purchasing decisions.
    Full-text · Article · Jan 2012 · Knowledge and Information Systems
  • Source
    Lei Fang · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Community based question and answering (cQA) services provide a convenient way for online users to share and exchange information and knowledge, which is highly valuable for information seeking. User interest and dedication act as the motivation to promote the interactive process of question and answering. In this paper, we aim to address a key issue about cQA systems: routing newly asked questions to appropriate users that may potentially provide answer with high quality. We incorporate answer quality and answer content to build a probabilistic question routing model. Our proposed model is capable of 1) differentiating and quantifying the authority of users for different topic or category; 2) routing questions to users with expertise. Experimental results based on a large collection of data from Wenwen demonstrate that our model is effective and has promising performance.
    Full-text · Article · Jan 2012
  • Source
    Po Hu · Minlie Huang · Peng Xu · Weichang Li · Adam K Usadi · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Patents are critical for a company to protect its core technologies. Effective patent mining in massive patent databases can provide companies with valuable insights to develop strategies for IP management and marketing. In this paper, we study a novel patent mining problem of automatically discovering core patents (i.e., patents with high novelty and influence in a domain). We address the unique patent vocabulary usage problem, which is not considered in traditional word-based statistical methods, and propose a topic-based temporal mining approach to quantify a patent's novelty and influence. Comprehensive experimental results on real-world patent portfolios show the effectiveness of our method.
    Full-text · Conference Paper · Jan 2012
  • Source
    Jie Tang · Bo Wang · Yang Yang · Po Hu · Yanting Zhao · Xinyu Yan · Bo Gao · Minlie Huang · Peng Xu · Weichang Li · others
    [Show abstract] [Hide abstract]
    ABSTRACT: Patenting is one of the most important ways to protect company's core business concepts and proprietary technologies. Analyzing large volume of patent data can uncover the potential competitive or collaborative relations among companies in certain areas, which can provide valuable information to develop strategies for intellectual property (IP), R&D, and marketing. In this paper, we present a novel topic-driven patent analysis and mining system. Instead of merely searching over patent content, we focus on studying the heterogeneous patent network derived from the patent database, which is represented by several types of objects (companies, inventors, and technical content) jointly evolving over time. We design and implement a general topic-driven framework for analyzing and mining the heterogeneous patent network. Specifically, we propose a dynamic probabilistic model to characterize the topical evolution of these objects within the patent network. Based on this modeling framework, we derive several patent analytics tools that can be directly used for IP and R&D strategy planning, including a heterogeneous network co-ranking method, a topic-level competitor evolution analysis algorithm, and a method to summarize the search results. We evaluate the proposed methods on a real-world patent database. The experimental results show that the proposed techniques clearly outperform the corresponding baseline methods.
    Full-text · Conference Paper · Jan 2012
  • Hongtao Zhang · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: An enormous number of gene-disease associations (GDA) are buried in millions of research articles published over the years, and the number is growing. Extracting them automatically is a challenging bioinformatics task. Although previous works have shown that supervised learning methods are superior for this task, the performance still relies on manually labeled training data. In this paper, we propose a solution to learn from plenty of labeled protein-protein interaction (PPI) data, and utilize the learned knowledge to help the extraction of GDA. In particular, a support vector machine modified for corpus weighting (SVM-CW) was applied to weight labeled PPI data, in order to allow knowledge to be effectively transferred from the PPI domain data to the GDA domain. The experimental results show that our solution can make full use of labeled PPI data and improve the performance of GDA extraction.
    No preview · Article · Dec 2011
  • Source
    Po Hu · Minlie Huang · Peng Xu · Weichang Li · Adam K. Usadi · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Though news readers can easily access a large number of news articles from the Internet, they can be overwhelmed by the quantity of information available, making it hard to get a concise, global picture of a news topic. In this paper we propose a novel method to address this problem. Given a set of articles for a given news topic, the proposed method models theme variation through time and identifies the breakpoints, which are time points when decisive changes occur. For each breakpoint, a brief summary is automatically constructed based on articles associated with the particular time point. Summaries are then ordered chronologically to form a timeline overview of the news topic. In this fashion, readers can easily track various news topics efficiently. We have conducted experiments on 15 popular topics in 2010. Empirical experiments show the effectiveness of our approach and its advantages over other approaches.
    Full-text · Conference Paper · Dec 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance.
    Full-text · Article · Oct 2011 · BMC Bioinformatics
  • Hongtao Zhang · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: A large number of protein-protein interactions (PPIs) have buried in massive biomedical articles published over the years. This leads to the development of automatic PPI extraction methods. However, existing methods based on supervised machine learning still face some challenges: (1) the feature space exploited in these methods is very sparse; and (2) the data used for training are imbalanced with respect to categories to be classified. In this paper, we first construct rich and compact features to alleviate the issue of feature sparseness. With these features, our method outperforms baselines by up to an F-score of 9.58% on the original AIMed corpus. Furthermore, we propose a data sampling strategy based on under-sampling to address the class imbalance problem. In order to re-balance data distribution, samples of the majority class are removed according to the prediction results iteratively. By this means, our method achieves a further 2.49% improvement in F-score on the original AIMed corpus.
    No preview · Conference Paper · Oct 2011
  • Source
    Feng Jin · Minlie Huang · Xiaoyan Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: Although the goal of traditional text summarization is to generate summaries with diverse information, most of those applications have no explicit definition of the information structure. Thus, it is difficult to generate truly structure-aware summaries because the information structure to guide summarization is unclear. In this paper, we present a novel framework to generate guided summaries for product reviews. The guided summary has an explicitly defined structure which comes from the important aspects of products. The proposed framework attempts to maximize expected aspect satisfaction during summary generation. The importance of an aspect to a generated summary is modeled using Labeled Latent Dirichlet Allocation. Empirical experimental results on consumer reviews of cars show the effectiveness of our method.
    Full-text · Article · Jul 2011 · Journal of Computer Science and Technology