Conference Paper

Towards Machine Learning on the Semantic Web

DOI: 10.1007/978-3-540-89765-1_17 Conference: Uncertainty Reasoning for the Semantic Web I, ISWC International Workshops, URSW 2005-2007, Revised Selected and Invited Papers
Source: DBLP


In this paper we explore some of the opportunities and chal- lenges for machine learning on the Semantic Web. The Semantic Web provides standardized formats for the representation of both data and ontological background knowledge. Semantic Web standards are used to describe meta data but also have great potential as a general data for- mat for data communication and data integration. Within a broad range of possible applications machine learning will play an increasingly im- portant role: Machine learning solutions have been developed to support the management of ontologies, for the semi-automatic annotation of un- structured data, and to integrate semantic information into web mining. Machine learning will increasingly be employed to analyze distributed data sources described in Semantic Web formats and to support approx- imate Semantic Web reasoning and querying. In this paper we discuss existing and future applications of machine learning on the Semantic Web with a strong focus on learning algorithms that are suitable for the relational character of the Semantic Web's data structure. We discuss some of the particular aspects of learning that we expect will be of rele- vance for the Semantic Web such as scalability, missing and contradicting data, and the potential to integrate ontological background knowledge. In addition we review some of the work on the learning of ontologies and on the population of ontologies, mostly in the context of textual data.

Download full-text


Available from: Achim Rettinger
  • Source
    • "Our discussion follows the discussion in (Tresp et al., 2008), (Yu et al., 2005), and (Yu et al., 2006). Consider entities of a certain type (e.g., person) and the associated (subject, predicate, object) triples with these entities being the subjects. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In the Semantic Web vision of the World Wide Web, content will not only be accessible to humans but will also be available in machine interpretable form as ontological knowledge bases. Ontological knowledge bases enable formal querying and reasoning and, consequently, a main research focus has been the investigation of how deductive reasoning can be utilized in ontological representations to enable more advanced applications. However, purely logic methods have not yet proven to be very effective for several reasons: First, there still is the unsolved problem of scalability of reasoning to Web scale. Second, logical reasoning has problems with uncertain information, which is abundant on Semantic Web data due to its distributed and heterogeneous nature. Third, the construction of ontological knowledge bases suitable for advanced reasoning techniques is complex, which ultimately results in a lack of such expressive real-world data sets with large amounts of instance data. From another perspective, the more expressive structured representations open up new opportunities for data mining, knowledge extraction and machine learning techniques. If moving towards the idea that part of the knowledge already lies in the data, inductive methods appear promising, in particular since inductive methods can inherently handle noisy, inconsistent, uncertain and missing data. While there has been broad coverage of inducing concept structures from less structured sources (text, Web pages), like in ontology learning, given the problems mentioned above, we focus on new methods for dealing with Semantic Web knowledge bases, relying on statistical inference on their standard representations. We argue that machine learning research has to offer a wide variety of methods applicable to different expressivity levels of Semantic Web knowledge bases: ranging from weakly expressive but widely available knowledge bases in RDF to highly expressive first-order knowledge bases, this paper surveys statistical approaches to mining the Semantic Web. We specifically cover similarity and distance-based methods, kernel machines, multivariate prediction models, relational graphical models and first-order probabilistic learning approaches and discuss their applicability to Semantic Web representations. Finally we present selected experiments which were conducted on Semantic Web mining tasks for some of the algorithms presented before. This is intended to show the breadth and general potential of this exiting new research and application area for data mining.
    Full-text · Article · May 2012 · Data Mining and Knowledge Discovery
  • Source
    • "The adoption of inductive approaches for ontology mining is mainly motivated by the necessity of: a) semi-automatize the mining of the assertional part of an ontology (i.e. the ontology population task); b) overcoming the limitations showed by deductive reasoning in the SW context [44] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, building ontologies is a time consuming task since they are mainly manually built. This makes hard the full realization of the Semantic Web view. In order to overcome this issue, machine learning techniques, and specifically inductive learning methods, could be fruitfully exploited for learning models from existing Web data. In this paper we survey methods for (semi-)automatically building and enriching ontologies from existing sources of information such as Linked Data, tagged data, social networks, ontologies. In this way, a large amount of ontologies could be quickly available and possibly only refined by the knowledge engineers. Furthermore, inductive incremental learning techniques could be adopted to perform reasoning at large scale, for which the deductive approach has showed its limitations. Indeed, incremental methods allow to learn models from samples of data and then to refine/enrich the model when new (samples of) data are available. If on one hand this means to abandon sound and complete reasoning procedures for the advantage of uncertain conclusions, on the other hand this could allow to reason on the entire Web. Besides, the adoption of inductive learning methods could make also possible to dial with the intrinsic uncertainty characterizing the Web, that, for its nature, could have incomplete and/or contradictory information.
    Full-text · Article · Jan 2010 · Semantic Web
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many Semantic Web domains a tremendous number of statements (expressed as triples) can potentially be true but, in a given domain, only a small number of statements is known to be true or can be inferred to be true. It thus makes sense to attempt to estimate the truth values of statements by exploring regularities in the Semantic Web data via machine learning. Our goal is a "push-button" learning approach that requires a minimum of user intervention. The learned knowledge is ma- terialized o-line (at loading time) such that querying is fast. We define an extension of SPARQL for the integration of the learned probabilistic statements into querying. The proposed approach deals well with typical properties of Semantic Web data. i.e., with the sparsity of the data and with missing data. Statements that can be inferred via logical reasoning can readily be integrated into learning and querying. We study learning algorithms that are suitable for the resulting high-dimensional sparse data matrix. We present experimental results using a friend-of-a-friend data set.
    Full-text · Article ·
Show more