Conference Paper

Multiple Information Sources Cooperative Learning.

Conference: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009
Source: DBLP


Many applications are facing the problem of learn- ing from an objective dataset, whereas information from other auxiliary sources may be beneficial but cannot be integrated into the objective dataset for learning. In this paper, we propose an omni-view learning approach to enable learning from multi- ple data collections. The theme is to organize het- erogeneous data sources into a unified table with global data view. To achieve the omni-view learn- ing goal, we consider that the objective dataset and the auxiliary datasets share some instance-level dependency structures. We then propose a rela- tional k-means to cluster instances in each auxil- iary dataset, such that clusters can help build new features to capture correlations between the objec- tive and auxiliary datasets. Experimental results demonstrate that omni-view learning can help build models which outperform the ones learned from the objective dataset only. Comparisons with the co-training algorithm further assert that omni-view learning provides an alternative, yet effective, way for semi-supervised learning.

5 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a framework to build prediction models from data streams which contain both labeled and unlabeled examples. We argue that due to the increasing data collection ability but limited resources for labeling, stream data collected at hand may only have a small number of labeled examples, whereas a large portion of data remain unlabeled but can be beneficial for learning. Unleashing the full potential of the unlabeled instances for stream data mining is, however, a significant challenge, consider that even fully labeled data streams may suffer from the concept drifting, and inappropriate uses of the unlabeled samples may only make the problem even worse. To build prediction models, we first categorize the stream data into four different categories, each of which corresponds to the situation where concept drifting may or may not exist in the labeled and unlabeled data. After that, we propose a relational k-means based transfer semi-supervised SVM learning framework (RK-TS3VM), which intends to leverage labeled and unlabeled samples to build prediction models. Experimental results and comparisons on both synthetic and real-world data streams demonstrate that the proposed framework is able to help build prediction models more accurate than other simple approaches can offer.
    ICDM 2009, The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: The promise of distributed classification is to improve the classification accuracy of peers on their respective local data, using the knowledge of other peers in the distributed network. Though in reality, data across peers may be drastically different from each other (in the distribution of observations and/or the labels), current explorations implicitly assume that all learning agents receive data from the same distribution. We remove this simplifying assumption by allowing peers to draw from arbitrary data distributions and be based on arbitrary spaces, thus formalizing the general problem of distributed classification. We find that this problem is difficult because it does not admit state-of-the-art solutions in distributed classification. We also discuss the relation between the general problem and transfer learning, and show that transfer learning approaches cannot be trivially fitted to solve the problem. Finally, we present a list of open research problems in this challenging field.
    ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 14 December 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: A new image classification method with multiple feature-based classifier (MFC) is proposed in this paper. MFC does not use the entire feature vectors extracted from the original data in a concatenated form to classify each datum, but rather uses groups of features related to each feature vector separately. In the training stage, a confusion table calculated from each local classifier that uses a specific feature vector group is drawn throughout the accuracy of each local classifier and then, in the testing stage, the final classification result is obtained by applying weights corresponding to the confidence level of each local classifier. The proposed MFC algorithm is applied to the problem of image classification on a set of image data. The results demonstrate that the proposed MFC scheme can optimally enhance the classification accuracy of individual classifiers that use specific feature vector group.
    ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 14 December 2010; 01/2010
Show more


5 Reads
Available from