Conference Paper

Rule Ensemble Learning using Hierarchical Kernels on Structured Output Spaces

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The goal is to investigate the effectiveness of using relational features in the input space of a max-margin based sequence labeling model. Our work is based on Struc-tHKL (Nair et al., 2012) and standard StructSVM formulations. We propose two techniques to learn a richer sequence labeling model by using relational features discussed above. ...
... 1. Optimally (Han and Wang 2009;Nowozin et al. 2007;Kudo et al. 2004) or heuristically (Joshi 2008;Saha et al. 2012;Specia et al. 2009;Ramakrishnan et al. 2007;Specia et al. 2006;Nagesh et al. 2012;Chalamalla et al. 2008) solve a discrete optimization problem. 2. Optimally (Jawanpuria et al. 2011;Nair et al. 2012) solve a convex optimization problem with sparsity inducing regularizers; ...
Article
Full-text available
A particularly successful role for Inductive Logic Programming (ILP) is as a tool for discovering useful relational features for subsequent use in a predictive model. Conceptually, the case for using ILP to construct relational features rests on treating these features as functions, the automated discovery of which necessarily requires some form of first-order learning. Practically, there are now several reports in the literature that suggest that augmenting any existing features with ILP-discovered relational features can substantially improve the predictive power of a model. While the approach is straightforward enough, much still needs to be done to scale it up to explore more fully the space of possible features that can be constructed by an ILP system. This is in principle, infinite and in practice, extremely large. Applications have been confined to heuristic or random selections from this space. In this paper, we address this computational difficulty by allowing features to be constructed in a distributed manner. That is, there is a network of computational units, each of which employs an ILP engine to construct some small number of features and then builds a (local) model. We then employ a consensus-based algorithm, in which neighboring nodes share information to update local models. For a category of models (those with convex loss functions), it can be shown that the algorithm will result in all nodes converging to a consensus model. In practice, it may be slow to achieve this convergence. Nevertheless, our results on synthetic and real datasets that suggests that in relatively short time the "best" node in the network reaches a model whose predictive accuracy is comparable to that obtained using more computational effort in a non-distributed setting (the best node is identified as the one whose weights converge first).
... The applicability of StructHKL in learning complex relational features that are derived from inputs at different relative positions in a sequence, is non-trivial and challenging. Therefore, from our feature categorization, we identify simpler categories that can be ANDed to yield complex ones, with the goal of formu-2 This part of our work has appeared at AAAI, 2012 [21]. 3 Structured output where each node in the structure can possibly be labeled as one of multiple class/label lating efficient, yet effective relational feature learning procedures. ...
Article
Full-text available
Discovering relational structure between input features in sequence labeling models has shown to improve their accuracy in several problem settings. However, the search space of relational features is exponential in the number of basic input features. Consequently, approaches that learn relational features, tend to follow a greedy search strategy. In this paper, we study the possibility of optimally learning and applying discriminative relational features for sequence labeling. For learning features derived from inputs at a particular sequence position, we propose a Hierarchical Kernels-based approach (referred to as Hierarchical Kernel Learning for Structured Output Spaces - StructHKL). This approach optimally and efficiently explores the hierarchical structure of the feature space for problems with structured output spaces such as sequence labeling. Since the StructHKL approach has limitations in learning complex relational features derived from inputs at relative positions, we propose two solutions to learn relational features namely, (i) enumerating simple component features of complex relational features and discovering their compositions using StructHKL and (ii) leveraging relational kernels, that compute the similarity between instances implicitly, in the sequence labeling problem. We perform extensive empirical evaluation on publicly available datasets and record our observations on settings in which certain approaches are effective.
Article
The Indian Institute of Technology (IIT) Bombay has a history of research and development in the area of databases, dating back to the early 1980s. D. B. Phatak and N. L. Sarda were among the first faculty members at IIT Bombay to work in the area of database systems. The number of PhD students increased from around 1 or 2 enrolled at a time in the early 1990s, to about 12 to 15 students at a time in recent years. While this number is much better than earlier, and is increasing rapidly, it is still small by most standards. However, the master's and bachelor's students have compensated for the shortage of PhD students, and have made very significant contributions to the research efforts, with well over three fourths of the papers having such students as coauthors. Graph data models are ubiquitous in semistructured search. Modeling a data graph as an electrical network, or equivalently, as a Markovian 'random surfer' process, is widely used in applications that need to characterize some notion of graph proximity.
Conference Paper
Discovering relational structure between input features in sequence labeling models has shown to improve their accuracies in several problem settings. The problem of learning relational structure for sequence labeling can be posed as learning Markov Logic Networks (MLN) for sequence labeling, which we abbreviate as Markov Logic Chains (MLC). This objective in propositional space can be solved efficiently and optimally by a Hierarchical Kernels based approach, referred to as StructRELHKL, which we recently proposed. However, the applicability of StructRELHKL in complex first order settings is non-trivial and challenging. We present the challenges and possibilities for optimally and simultaneously learning the structure as well as parameters of MLCs (as against learning them separately and/or greedily). Here, we look into leveraging the StructRELHKL approach for optimizing the MLC learning steps to the extent possible. To this end, we categorize first order MLC features based on their complexity and show that complex features can be constructed from simpler ones. We define a self-contained class of features called absolute features (\(\mathcal{AF}\)), which can be conjoined to yield complex MLC features. Our approach first generates a set of relevant \(\mathcal{AF}\)s and then makes use of the algorithm for StructRELHKL to learn their optimal conjunctions. We demonstrate the efficiency of our approach by evaluating on a publicly available activity recognition dataset.
ResearchGate has not been able to resolve any references for this publication.